Study Reveals Environmental Impact of AI in X-Ray Diagnosis: Smaller Models Outperform LLMs

Malvern, UK – A recent paper by Liam Kearns from AuraQ investigates the environmental impact and accuracy of using AI in medical applications, specifically for detecting COVID-19 in chest X-rays. The study compares the performance and carbon footprints of various models, including large language models (LLMs) and smaller, custom-built discriminative models. The research highlights that while LLMs like ChatGPT and Claude offer versatility and ease of use, they often lead to a disproportionately larger carbon footprint compared to smaller, task-specific models. The paper integrates both LLMs and small discriminative models into a Mendix application to detect COVID-19, providing a benchmark study of 14 different model configurations. Findings indicate that smaller models reduce the carbon footprint, but may exhibit biases and lower confidence levels. Restricting LLMs to probabilistic outputs also resulted in poor performance. The most efficient solution was the Covid-Net model, which achieved an accuracy of 95.5% with a carbon footprint 99.9% less than GPT-4.5-Preview. The study demonstrates that LLMs requested to provide probabilistic outputs to classify diseases present major concerns for accuracy and carbon footprint. It also highlights that both diagnosis accuracy and carbon footprint can be improved by using local models deployed alongside applications. However, reducing carbon footprint at the expense of accuracy can diminish the confidence and accuracy of outputs from smaller models. Knowledge bases for LLMs can increase detection accuracy, but have a varied impact on carbon footprint. This paper contributes to the understanding of the trade-offs between generative and discriminative models in COVID-19 detection and highlights the environmental risks of using generative tools for classification tasks.