The European Union’s new regulations on artificial intelligence (AI) will compel companies to increase transparency regarding the data used to train their systems, exposing a typically closely guarded aspect of the industry. Since OpenAI introduced ChatGPT to the public 18 months ago with backing from Microsoft (MSFT.O), interest and investment in generative AI have surged. This technology encompasses applications capable of swiftly generating text, images, and audio content.
However, amidst this growth, concerns have arisen about the sources from which AI companies derive their training data. Questions linger about the legality of using copyrighted materials such as bestselling books and Hollywood films without permission. The EU’s recently enacted AI Act will be implemented gradually over the next two years, allowing time for regulators to enforce the new laws while businesses adapt to fresh obligations. Yet, the practical implications of certain provisions remain unclear.
A particularly contentious aspect of the Act mandates that organizations deploying general-purpose AI models like ChatGPT must furnish “comprehensive summaries” detailing the content used in their training. The newly established AI Office intends to release a guideline template by early 2025 following consultations with stakeholders.
Despite these developments, AI firms staunchly oppose divulging their training data, citing it as a proprietary trade secret that could unfairly benefit competitors if made public. Matthieu Riouf, CEO of AI-driven image editing company Photoroom, likened this secrecy to a cherished recipe’s secret ingredient, essential for distinguishing superior offerings in the market.
The degree of detail included in these transparency reports will significantly impact smaller AI startups and tech giants such as Google (GOOGL.O) and Meta (META.O), both of whom have positioned AI technology at the forefront of their future strategies.