Research Mind

DILI Classifier
Link to Drug Induced Liver Injury Classifier

DILI Classifier:

An AI based classifier to search for Drug-Induced Liver Injury Literature

The DILI Classifier has been developed and presented at the Annual International Conference on Critical Assessment of Massive Data Analysis (CAMDA) 2021. Please find below the detailed abstract submitted.


Sanjay Rathee1,3, Meabh MacMahon1,2,3, Anika Liu1, Nicholas Katritsis1, Gehad Youssef1, Woochang Hwang1, Lilly Wollman1, Namshik Han1 

1 Milner Therapeutics Institute, University of Cambridge, Cambridge, CB2 0AW, UK 2 LifeArc, Stevenage, SG1 2FX, UK 3These authors contributed equally 


Drug-Induced Liver Injury (DILI) is a class of adverse drug reaction (ADR) which causes prob lems in both clinical and research settings. It is the most frequent cause of acute liver failure in the majority of western countries[4] and is a major cause of attrition of novel drug candidates[2]. Manual trawling of literature for DILI papers is the main route of obtaining data from DILI studies. This makes it an inefficient process prone to human error. Therefore, an automatized AI model capable of retrieving DILI-related papers from the huge ocean of literature could be invaluable for the drug discovery community. In this project, we built an artificial intelligence (AI) model com bining the power of Natural Language Processing (NLP) and Machine Learning (ML) to address the above problem. This model uses NLP to filter out meaningless text (e.g. stopwords) and uses customized functions to extract relevant keywords as singleton, pair, triplet and so on. These keywords are processed by apriori pattern mining ML algorithm to extract relevant patterns which are used to estimate initial weightings for a ML classifier. Along with pattern importance and frequency, an FDA-approved drug list mentioning DILI adds extra confidence in classification. The combined power of these methods build a DILI classifier (DILIC ) with 94.91% cross-validation and 94.14% external validation accuracy. A R Shiny App capable to classify single or multiple entries will be developed to enhance user experience.