AI for Tardive Dyskinesia Detection

Source: psychiatrist.com

Published on May 28, 2025

Detecting Tardive Dyskinesia Using AI

Tardive dyskinesia (TD) is a neurological syndrome that can be brought on by taking antipsychotic drugs for a long time. It causes a person to make involuntary, repeated movements of their face, trunk, and other body parts. Grimacing, blinking, and abnormal posture are all examples of such movements. TD can lower a patient's quality of life and make it harder for them to adhere to their antipsychotic treatment plan. Finding TD early makes it possible to start treatments that can lessen morbidity.

Validated scales are used for traditional assessment. The Abnormal Involuntary Movement Scale (AIMS) is frequently used, and is more accurate when used by raters who have a lot of experience with TD. However, it's difficult for even the best diagnosticians to provide every patient with the recommended TD monitoring, which calls for in-person resources about 2–4 times per year.

This article discusses the reliability of machine learning techniques for improving remote TD screening. While the technology is completely remote, it does not take the place of a doctor's in-person examination for a definitive diagnosis and care. The rise in telemedicine and the growing need for psychiatric services have put more strain on the healthcare system, often at the expense of safety monitoring procedures like those for TD.

Methods

The results of three studies that attempted to stratify the risk of suspected TD in individuals using antipsychotic drugs are reported here. The algorithm can also stratify the severity of the disorder in addition to identifying the presence or absence of suspected TD. The primary goal was to achieve performance at or above 90% area under the curve (AUC) in detecting the presence or absence of suspected TD using a visual transformation algorithm, compared to AIMS ratings by experienced raters on individuals using an antipsychotic medication.

Across three trials, video data was gathered from individuals using antipsychotic drugs. All participants were at risk for TD, and a smartphone app was used to capture video data while leading participants through a set protocol. Our algorithm assessed each participant's video responses and compared them to a trained rater evaluation of an AIMS, which served as the basis for determining whether TD was present or absent. Participants in Studies 1 and 2 completed a standard AIMS procedure by sitting across from a device on a stand. A close-up video image captured the individual's face, trunk, and hands for the portions that concentrated on the face and mouth. Participants in Study 3 underwent a standard AIMS examination administered by a single qualified rater.

Results

The model's training included data from Study 1 and Study 2. Participants were recruited from clinic populations and behavioral health community clubhouse settings. Participants had to have been taking an antipsychotic drug for at least 90 days to be included. Participants were excluded if they had a head injury in the previous year, a history of cognitive or developmental impairment, or severe visual impairment. Participants were enrolled to ensure a balance of individuals who had previously been diagnosed with TD and those who had not.

Study 3 added participants. The final dataset included video responses. The AIMS score was the target output to train a neural network to evaluate the videos. The level of agreement between the algorithm’s conclusion and the raters’ consensus was compared.

To test the algorithm's feasibility and validity, three trained raters assessed videos of people both with and without a TD diagnosis while they completed all AIMS components. A machine learning algorithm using convolutional neural networks assessed the open-ended questions, focusing only on the upper trunk and facial detection. The machine learning engine was able to distinguish between the individuals with and without TD, with an AUC of 0.77 (95% CI, 0.679–0.859), comparing on the ground truth of the presence or absence of TD as established by the consensus of the panel of 3 trained raters based on their conclusion using the AIMS.

The vision transformer model is compared to a trained rater assessment of the videos using the AIMS. The average of three raters in Studies 1 and 2 evaluating video-recorded AIMS and one rater in Study 3 completing an AIMS in person is used. Capturing video data of the face, shoulders, trunk, arms, and hands should allow for the diagnosis of TD.

Discussion

The model’s performance was evaluated by iteratively adding data collected across 3 studies. When the model was trained on all available data, the AUC ranged from 0.85 to 0.98 across the available test sets. The model achieved a Cohen κ of 0.51, demonstrating greater consistency than the reviewers’ initial assessments. Furthermore, when utilizing the full dataset, the model’s Cohen κ increased to 0.61, considered a strong and reliable level of agreement, outperforming human raters.

These results demonstrate the potential of using video-based machine learning algorithms to monitor for the presence or absence of suspected TD. The model can reliably identify TD and often outperforms human raters in terms of sensitivity and specificity, according to the findings. The algorithm exhibited less bias when compared to the interrater reliability of the human raters. The temporal nature of the embeddings also enables the pinpointing of the time points at which risk-positive movements occurred. This improves the interpretability of AI predictions. The high predictiveness of the model across different subgroups is notable in that clinical populations often differ from clinical trial populations, limiting the generalizability of expected benefits from breakthrough treatments.

A rapid and automatic TD detection method would enable timely diagnosis and avoid morbidity, potentially obviating the need for expensive lifelong treatment. The variety of training videos gathered is vital to the final model’s ability to generalize to future evaluations and significantly increase performance. While the algorithm significantly enhances the detection of TD, it cannot function independently as a diagnostic tool. A health care professional’s evaluation is essential to confirm the diagnosis required for prescribing treatment.

Future research should emphasize a longitudinal approach in which patients are monitored monthly or quarterly with medication monitoring to demonstrate the potential of smartphone-based patient monitoring of TD fully. The combined studies demonstrate that self administered, smartphone-recorded video interviews can reliably yield data scored using algorithms produced using highly discriminating machine learning approaches. This technology has the potential to significantly improve early diagnosis and patient outcomes, especially in remote care settings where resources are the scarcest.