Themis AI: Improving AI Reliability

Source: news.mit.edu

Published on June 3, 2025

AI systems like ChatGPT often provide answers that seem credible but may not reveal gaps in knowledge. This poses challenges as AI is used increasingly in critical sectors. Themis AI, an MIT spinout, aims to address this by quantifying model uncertainty and rectifying outputs.

The company's Capsa platform works with machine-learning models to identify and correct unreliable outputs quickly. It modifies AI models to detect ambiguity, incompleteness, or bias in data processing.

Themis AI co-founder Daniela Rus explains that Capsa identifies a model's uncertainties and failure modes to enhance it and ensure correct functioning.

Company Growth and Applications

Founded in 2021, Themis AI has assisted telecom companies with network planning and automation, helped oil and gas companies analyze seismic imagery using AI, and published research on reliable chatbots.

Alexander Amini, another co-founder, emphasizes the importance of enabling AI in high-stakes industries, highlighting the potential consequences of AI errors. Themis AI aims to enable AI to predict its own failures.

Research and Development

Rus's lab has studied model uncertainty for years. In 2018, research was funded to study the reliability of machine learning for autonomous driving.

Themis AI's team also created an algorithm to detect and eliminate racial and gender bias in facial recognition systems by reweighting training data.

In 2021, the co-founders demonstrated that a similar approach could help pharmaceutical companies use AI to predict drug candidate properties. This led to the founding of Themis AI.

Themis AI is currently collaborating with companies across various industries, particularly those using large language models. Capsa helps these models assess their uncertainty for each output.

Stewart Jamieson, Themis AI's head of technology, notes that Capsa enables LLMs to self-report confidence and uncertainty, improving question answering and flagging unreliable outputs.

Themis AI is also in talks with semiconductor companies to create AI solutions that function outside cloud environments, offering efficient edge computing without sacrificing quality. This approach allows edge devices to handle most tasks, forwarding uncertain outputs to a central server.

Pharmaceutical companies can also leverage Capsa to refine AI models for identifying drug candidates and predicting clinical trial performance. Amini notes that Capsa can offer insights into whether predictions are supported by training data, potentially accelerating the identification of the strongest predictions.

Future Impact

The Themis AI team is exploring Capsa’s ability to improve accuracy in chain-of-thought reasoning, where LLMs explain their reasoning steps. Jamieson suggests that Capsa could guide reasoning processes to identify the highest-confidence chains, potentially improving the LLM experience and reducing computation needs.

Rus sees Themis AI as a way to ensure her MIT research has real-world impact, addressing both the potential and concerns of AI.