AI for Fracture Diagnosis Accuracy
Source: dovepress.com
AI in Fracture Diagnosis
Traumatic fractures and dislocations can be missed up to 10% of the time in emergency rooms and by junior orthopedic residents. This review aimed to assess the accuracy of AI in fracture detection and compare it to residents in training.
Methods
Researchers searched electronic databases for English language articles from January 2015 to July 2023. Databases included Pub Med, Scopus, Web of Science, Cochrane Central Ovid Medline, Ovid Embase, and EBSCO Cumulative Index to Allied Health Literature. Keywords used were Artificial Intelligence, fractures, dislocations, X-rays, radiographs, and missed diagnosis. Data extracted included the number of patients/images studied, fracture sites analyzed, algorithms used, accuracy of the algorithm's report, sensitivity, specificity, AUC, and comparisons between the algorithm, junior orthopedic residents, emergency physicians, and board-certified radiologists.
Results
Twenty-seven publications met the objectives and were analyzed. Ninety-two thousand two hundred and thirty-six images were analyzed for fractures. The overall accuracy of correct diagnoses was 90.35± 6.88%, sensitivity was 90.08± 8.2%, specificity was 90.16± 7, and AUC was 0.931± 0.06. The AI model's accuracy was 94.24± 4.19, while orthopedic residents had an accuracy of 85.18± 7.01 (P value of < 0.0001). The sensitivity was 92.15± 7.12 versus 86.38± 7.6 (P< 0.0001), and specificity was 93.77± 4.03 versus 87.05± 12.9 (P< 0.0001). One study compared 1703 hip fracture images between the AI model, orthopedic residents, and board-certified radiologists. It found the accuracy to be 98% versus 87% and 92% (P value of < 0.0001).
Conclusion
The review highlights AI’s potential for accurate fracture diagnosis. The AI algorithm should be used in emergency rooms by trainee residents and junior orthopedic residents to reduce missed fractures.
Fractures occur in all age groups, depending on the trauma type, location, and related injuries. Fracture incidence ranges from 733 to 4017 per 100000 patient-years. Traumatic fractures are a major cause of morbidity and mortality; one study of 23,917 individuals showed 27,169 fractures, with 64.5% in women.
Missed fracture or dislocation diagnoses on plain radiographs range from 3% to 10%, affecting recovery. Errors often occur in the emergency room due to incorrect radiograph interpretation, subtle injuries, or inadequate training. This is common among junior residents and can occur in trained radiologists. Radiologists in the USA were 6th in malpractice claims, representing 3.1% of physicians.
AI, a part of computer science, can perform tasks typically done by humans. It uses machine learning, deep learning, and convolutional neural networks to extract information from images. Recent studies show AI algorithms accurately diagnose fractures and dislocations. This review assessed AI algorithm accuracy, sensitivity, and specificity in diagnosing fractures using plain radiographs.
The search included primary research using validated AI algorithms for fracture detection and comparative studies between AI algorithms and clinicians. Reports, letters to the editor, conference presentations, and systematic reviews were excluded. EndNoteTM 39 was used for references. Data extracted included the number of patients/images studied, fracture sites, algorithms, accuracy, sensitivity, specificity, AUC, and comparisons between algorithms and clinicians. Fracture predictions were analyzed using contingency tables. Regression analysis was performed between fracture sites and algorithm influence. A p-value of <0.05 was significant at a 95% confidence interval (CI). SPSS version 29 was used for data analysis. 2049 studies were retrieved, with 347 duplicates and 1651 exclusions based on criteria. 51 studies were reviewed in depth, and 27 met the objectives. 88,996 images were analyzed, showing an overall accuracy of 90.35±6.88 (73.59–98) percent, sensitivity of 90.08±8.2 (73.8–99) percent, specificity of 90.16±7 (72–100), and AUC of 0.931±0.06 (0.72–0.994). Fractures analyzed were common fractures from the wrist, upper and lower limbs, and spine. All studies used internally and externally validated algorithms for Diffusion-convolutional neural networks (DCNN). Most studies limited diagnoses to a single radiograph view.
Table 2 shows the analysis of 214950 images comparing AI algorithms to junior residents in training. The AI model's accuracy was 94.24±4.19, compared to 85.18±7.01 for orthopedic residents (P value of <0.0001), with sensitivity of 92.15±7.12 versus 86.38±7.6 (P<0.0001) and specificity of 93.77±4.03 versus 87.05±12.9 (P<0.0001). Yamada et al (2020) compared the AI model to orthopedic residents and board-certified radiologists, finding accuracy rates of 98%, 87%, and 92%, respectively (P value of <0.0001).
This review indicates that AI algorithms are more accurate than trained and trainee residents in diagnosing fractures. AI also improved the accuracy, sensitivity, and specificity of fracture diagnosis for trainees and trained radiologists. In this study, AI models showed an overall accuracy of 90.35±6.88%, sensitivity of 90.08±8.2%, specificity of 90.16±7, and AUC of 0.931±0.06. These results were based on plain radiographs and included limb and vertebral fractures.
There has been an increase in AI models, especially CNNs, in trauma and orthopedics. Individual models show that AI models are accurate in diagnosing fractures, performing better than junior residents and at par with senior radiologists. However, most data is from retrospective testing, with few prospective clinical practice studies.
Fracture diagnosis accuracy varies by site. Murphy et al (2022) reported that an AI model was 19% more accurate than expert clinicians in analyzing hip fractures. Another report suggested that the correct diagnosis sensitivity increases by over 10%. Lindsey et al (2018) reported that physician's average sensitivity improved from 80.8% to 91.5% (95% CI, 89.3–92.9%), and specificity improved from 87.5% to 93.9% (95% CI, 92.9–94.9%) with Deep convolutional neural networks, reducing misreading by around 47.0%. Duron et al (2021) found that emergency room physicians improved results from 61.3% to 74.3% (up 13.0%) with AI assistance, and trained radiologists improved from 80.2% to 84.6% (up 4.3%). Distal radius fractures, comprising over 20% of all fractures, were studied using an AI ensemble model among AI, orthopedic surgeons, and radiologists. Accuracy, sensitivity, and specificity between attending orthopedic surgeons and radiologists showed significant differences: 93.69%, 91.94%, and 95.44% compared to 92.53%, 90.44%, and 94.62%. The AI tool scored 97.75%, 97.13%, and 98.37% when compared to physician groups.
Missed extremity fracture diagnosis in trauma is a major issue. Malpractice claims against radiologists often involve inaccuracies in extremity fracture reporting. Orthopedic residents also misinterpret radiographs. A UK study showed that senior orthopedic residents missed 4% of fractures, made wrong diagnoses 7.8% of the time, and diagnosed fractures when none existed 12.6% of the time.
Claims against orthopedicians have increased, but complaints have remained relatively constant. Junior orthopedic residents need assistance for correct fracture diagnosis. AI enhances and complements fracture diagnosis accuracy but cannot replace human doctors. Adequate training in radiographic interpretation is essential, as junior residents until the 3rd level are more prone to errors.
The review is limited by the number of included studies and lack of comparative accuracy data between unaided and aided AI tools. Conclusions are based on retrospective studies, with no prospective studies for comparison. The study’s strength is the large dataset comparison, suggesting AI models are more accurate than physicians.
In conclusion, unbiased evaluations suggest that AI models can help residents in training by increasing fracture diagnosis accuracy and reducing errors. AI has developed cutting-edge tools that need further evaluation for hospitals to integrate AI into healthcare. This helps physicians improve fracture diagnosis and prevent delayed diagnosis complications. The authors declare no conflicts of interest.