AI Models Benchmarked for COVID-19 X-Ray Diagnosis Efficiency

A recent study by Liam Kearns at AuraQ evaluates the integration of AI tools, including large language models (LLMs), into medical applications for diagnosing COVID-19 from chest X-rays. The research benchmarks 14 model configurations, comparing accuracy and environmental impact within a Mendix application.

The findings reveal that while smaller, custom models reduce carbon footprint, they often exhibit biased outputs with lower confidence. Restricting LLMs to probabilistic outputs resulted in poor performance. The Covid-Net model emerged as the most efficient, achieving a 99.9% reduction in carbon footprint compared to GPT-4.5-Preview, with an accuracy of 95.5%.

Key Findings

The study highlights significant concerns regarding the accuracy and environmental impact of using LLMs for probabilistic outputs in disease classification from X-rays. Local models deployed alongside applications were found to reduce both carbon footprint and bias, emphasizing the benefits of custom solutions.

Knowledge bases for LLMs improved detection accuracy but had varying impacts on carbon footprint. The research underscores the environmental risks of using generative AI tools for classification tasks, contributing to the understanding of generative and discriminative models in COVID-19 detection.

Methodology

The study compared 14 AI model configurations, focusing on their accuracy and environmental impact in detecting COVID-19 from chest X-rays. Models ranged from smaller, custom configurations to larger LLMs like GPT-4.5-Preview and Claude.

The evaluation was conducted within a Mendix application, simulating real-world medical diagnostic scenarios. The carbon footprint was measured alongside diagnostic accuracy to provide a holistic assessment of each model's performance.

Implications

The findings have significant implications for the integration of AI in medical diagnostics. Custom models, while more environmentally friendly, must balance accuracy and bias. LLMs, though powerful, face challenges in probabilistic tasks and environmental sustainability.

The research advocates for the use of local models deployed alongside applications to achieve optimal diagnostic efficiency while minimizing environmental impact. This approach could set a new standard for sustainable AI in medical diagnostics.