News
AI Model Gives Data Owners Control
Source: wired.com
Published on July 10, 2025
Updated on July 10, 2025

Revolutionary AI Model Puts Data Owners in Control
Researchers at the Allen Institute for AI (Ai2) have unveiled FlexOlmo, a game-changing AI model that empowers data owners to retain control over how their data is utilized during training. This innovation addresses a long-standing issue in the AI industry, where data is often collected without proper consideration of ownership, leading to disputes and ethical concerns.
FlexOlmo introduces a novel approach to training large language models, allowing data contributors to influence how their data is incorporated into the AI system. Unlike traditional methods, where data is either included or excluded with no room for adjustment, FlexOlmo enables data owners to maintain oversight even after the model is built.
The Challenge of Data Control in AI
AI models are typically trained using vast amounts of data sourced from various platforms, often without explicit permission from the original data owners. Once the data is integrated into the model, extracting or modifying it becomes nearly impossible, akin to trying to retrieve eggs from a baked cake, as described by Ali Farhadi, CEO of Ai2. This lack of control has been a major obstacle for data owners who wish to retain authority over their contributions.
Farhadi explains that traditional AI training methods require a complete retraining of the model to remove or alter specific data, a process that is both time-consuming and costly. FlexOlmo overcomes this limitation by allowing data owners to contribute and withdraw their data without disrupting the overall model.
How FlexOlmo Works
The FlexOlmo model operates on a unique architecture called a "mixture of experts," which combines multiple sub-models to create a cohesive AI system. Data owners can train their own sub-models independently and then merge them with a shared "anchor" model. This asynchronous process ensures that data owners do not need to coordinate with each other, providing flexibility and autonomy.
Sewon Min, a research scientist at Ai2, highlights that this approach allows data owners to work at their own pace, contributing to the model without the need for centralized coordination. The result is a more efficient and inclusive training process that respects the autonomy of data contributors.
Testing and Performance
To validate FlexOlmo, Ai2 researchers created a dataset called Flexmix, sourced from books, websites, and other materials. They developed a model with 37 billion parameters and compared its performance against traditional models and other merging techniques. The results showed that FlexOlmo outperformed individual models and demonstrated superior performance in merged scenarios.
Farhadi notes that FlexOlmo offers a new paradigm for training AI models, where users can opt out of the system without compromising the integrity of the model. This flexibility is a significant step forward in addressing the ethical and practical challenges of AI training.
Expert Perspectives
Percy Liang, an AI researcher at Stanford, praises the Ai2 approach, describing it as a promising solution for modular control over data without the need for retraining. He emphasizes the importance of openness in the development process, a sentiment echoed by Farhadi and Min, who suggest that FlexOlmo could enable AI firms to access sensitive private data in a more controlled manner.
However, both Farhadi and Min caution that while FlexOlmo offers improved data control, there is still a risk of data reconstruction. They recommend incorporating differential privacy techniques to enhance data safety and ensure that sensitive information remains protected.
Legal and Ethical Implications
The ownership of data used in AI training has become a critical legal issue, with many data owners seeking greater control over how their contributions are utilized. FlexOlmo addresses this concern by allowing data owners to co-develop models without sacrificing privacy or control. This innovation could pave the way for more equitable and transparent AI development practices.
Min suggests that FlexOlmo could lead to better shared models, where data owners collaborate on AI development while maintaining their data rights. This shift towards a more collaborative and respectful approach to AI training could have far-reaching implications for the industry and society as a whole.