News

AI helps detect the severity of diabetes-related eye disease

Published: 25.11.2024

This advance could enable many new diagnostic applications in the future

Close-up of a human eye with yellow light reflecting in the pupil, highlighting the eyelashes and iris. — Diabetic retinopathy is screened using retinal imaging. Image: Adobe Stock Photo.

Researchers have successfully applied large language models to identify diabetic retinopathy (DR), a sight-threatening retinal disease caused by diabetes, and classify its severity from unstructured medical reports. This breakthrough demonstrates the potential for large language models to process unstructured datasets, such as patient records, across a variety of medical applications.

AI is already being used for tasks such as disease diagnosis, patient monitoring, and healthcare resource planning. Large language models (LLMs), such as ChatGPT and Bard, have also gained widespread attention for assisting with various professional tasks. In healthcare, these models are being developed for text classification, such as extracting critical findings from radiology reports.

The research group led by Professor Kimmo Kaski has long focused on AI-based detection of eye diseases, says postdoctoral researcher Joel Jaskari. The group previously explored AI models for image analysis, specifically deep convolutional neural networks, to assess the severity of DR using retinal images.

Person wearing a green shirt and a blue lanyard against a plain background. — Joel Jaskari

‘Diabetic retinopathy affects a significant portion of diabetes patients. If not treated in time, it can lead to blindness. That’s why early detection is so important,’ Jaskari notes.

Retinal imaging is the primary method of screening for DR. Findings, including disease severity, are recorded in examination reports. However, conducting statistical analyses on DR severity requires structured annotations, which can be labour-intensive to produce when the reports are unstructured. At HUS Helsinki University Hospital, retinal findings have traditionally been documented by doctors as detailed, free-text reports.

To train their AI model, the researchers used over 40,000 patient records from specialist care visits at Helsinki University Hospital between 2016 and 2019. During these visits, retinal images were taken, and physicians recorded their observations on any significant findings, such as the signs and severity of DR in unstructured text reports.

Collaborations with partners like HUS are crucial for tailoring AI solutions to Finnish healthcare, Jaskari emphasises. While global datasets for DR exist, they do not reflect the Finnish population or local medical practices. Moreover, the Finnish healthcare system generates sufficiently large datasets to train AI for domestic needs.

‘Modern AI methods, like GPT models, are inherently data-driven. They must be trained with task-specific datasets, making collaborations like this essential for advancing medical AI research,’ Jaskari explains.

Extracting DR severity from unstructured medical reports is difficult, or even impossible, with traditional programming. Initially, doctors and nurses manually analysed half of the dataset before researchers decided to train a language model for the remaining reports.

‘We fine-tuned a Finnish GPT model, originally developed by the University of Turku’s NLP group, using the manually analysed reports. The goal was for the model to classify the DR severity described in the medical reports in a similar manner as healthcare professionals. We named this specialized model DR-GPT,’ Jaskari mentions.

The study demonstrated that DR-GPT could analyze free-form Finnish medical reports with exceptional accuracy. The model’s annotations for previously unlabeled data were combined with the manually labeled data to create a comprehensive dataset for training image-based AI, significantly improving its performance.

The research demonstrates that large language models trained in Finnish can achieve excellent results when analyzing Finnish-language datasets, Jaskari notes. This conclusion is further supported by DR-GPT’s ability to deliver accurate results from challenging unstructured medical reports.

‘I see no reason an approach like DR-GPT couldn’t be applied to other medical datasets as well. In fact, fine-tuning GPT models is such a versatile approach that I believe similar Finnish-language AI models could be trained for many other purposes,’ Jaskari says.

And the model could work in other languages too: Jaskari explains that the FinGPT model, used as the basis for DR-GPT, is very similar to existing models in other languages.

‘Researchers around the world can use a pre-trained GPT model for their chosen language as a foundational model and apply the modifications reported in our study. I believe the resulting model would be similar to DR-GPT but in a different language,’ Jaskari concludes.

The research was published in the science journal PLOS One in October.