Tracking data for medical AI transformation

Modern medicine relies on recognizing patterns in patient data, but some patterns elude even skilled physicians. Supervised machine learning offers a way to detect these patterns by creating computer models that learn from labelled data, potentially reducing subjectivity in medicine.

AI in Healthcare

Interest in predictive modelling has grown significantly, with the AI in health-care market projected to exceed US$46 billion this year and $200 billion by 2030. However, models can still be a source of uncertainty, leading to overlooked concerns or unnecessary interventions.

A model's usefulness depends on how well it generalizes to new data, but models also absorb biases from the data used to train them. Transparency in data sources and testing in intended environments are crucial, but small, biased data sets can cause models to underperform.

EHR and Predictive Models

The widespread use of predictive modelling in health care relies on electronic health records (EHRs). EHRs provide data to train models, and the models' predictions guide clinical decisions. These predictions are also recorded in the EHR, which can create issues.

For example, a model detecting early sepsis signs might prompt intervention that prevents the condition from progressing. The EHR then records the warning signs as being associated with a non-septic outcome, creating a ‘contaminated association.’ Over time, this erodes the reliability of models.

Model Drift

Model drift, caused by shifts in patient demographics or changes in clinical practice, can also reduce accuracy. Retraining models becomes difficult as the EHR database gets corrupted with false associations.

When multiple AI models are used, interventions triggered by one model can disrupt others, even if they focus on different outcomes.

Monitoring Model Performance

Current approaches don’t account for how models interact or affect clinical decision-making, raising concerns about how model performance is monitored. If a model prevents an adverse event, its real-world performance might appear to decline, but this could also mean the model is making poor predictions.

Comparing outcomes when the model is active versus inactive can help determine effectiveness. Establishing an expected range of performance change is part of the evaluation, though model drift or clinical variability can interfere. Determining this range experimentally might be more reliable.

Assessing Model Effectiveness

Randomized controlled trials (RCTs) are the gold standard, but applying that level of control in clinical settings is rarely possible. As the number of models grows, isolated study results become less reliable. Models should be used in controlled environments, free from competing models or system changes.

Even if RCTs provide proof of effectiveness, they are costly and time-consuming. A more practical approach is external validation, testing the model on new data. However, this testing becomes harder when previous models have shaped the data being used.