The increasing availability of artificial intelligence (AI) and deep-learning algorithms able to analyse from many different perspectives the huge amount of R&D data already available in public and private research labs and databases are changing the way drug discovery and development is run. A recent review published in SLAS Discovery and signed by a group of AbbVie’s R&D scientists discusses the many new and emerging approaches to evaluate off-target toxicology from an industrial perspective. Many innovative AI-based companies are also playing an increasing role in supporting traditional pharmaceutical companies in the identification of new, promising therapeutics. We summarise the main features and the many tools discussed in the review and provide some not-exhaustive examples of new business models emerging in the pharmaceutical sector.
The integrated screening paradigm for off-target toxicity
The integrated screening paradigm may support the early identification of off-target toxicology and safety issues, a challenge that is still hard to estimate due to the difficulties to translate observed data from animal to humans and the possible inability to recognise potentially susceptible human subpopulations, argues AbbVie’s group of scientists in the in SLAS Discovery paper.
It is not possible to completely avoid off-target toxicity, especially when small molecules are concerned: an optimised lead may bind to several different targets, a phenomenon which impacts on the observed toxicity profile. Exposures in the in vivo populations represent another bias, according to the review, as many factors might alter this parameter in certain subpopulations of patients.
Gene expression profiling and mapping
Many new analytical tools are available, according to AbbVie’s scientists, to support the elucidation of target and off-target interactions of small molecules, such as for example the L1000 gene expression profiling. The method – born from a collaboration between the MIT’s Broad Institute, Harvard University and Genometry – is based on a high-throughput gene expression assay that measures the mRNA transcript abundance of 978 “landmark” (the “L” in the name) genes from human cells, coupled to the measure of expression of 80 control transcripts chosen for their invariant expression across cell states. Readings are performed on crude lysates of human cells, obtaining an output dataset of expression values for 22,000 genes × 380 samples, suited to be used by machine-learning algorithms and AI-guided drug discovery. The method might be applied, for example, to evaluate the broad impact of the small molecules on cells and identify common and targeted mechanisms of action.
The Broad Institute is also the creator of the Connectivity Map (CMap), a comprehensive catalog of cellular signatures representing systematic perturbation with genetic (reflecting protein function) and pharmacologic (reflecting small molecule function) perturbagens. The library currently contains over 1.5 million gene expression profiles from about 5,000 small molecule compounds and 3,000 genetic reagents, tested in multiple cell types. The database is hosted in the cloud-based infrastructure CLUE (CMap and LINCS Unified Environment), from which researchers can access and manipulate CMap data and integrate them with their own.
L1000 gene expression profiling is also at the base of the National Health Institute (NIH) Library of Integrated Network-Based Cellular Signatures (LINCS) program, an open resource containing assay results from cultured and primary human cells treated with bioactive small molecules, ligands such as growth factors and cytokines, or genetic perturbations. The program aims to better understand the functioning of cell pathways and to support the development of therapies able to restore perturbed pathways and networks.
Phenotypic profiling and CRISPR libraries
The BioMAP human primary cell phenotypic profiling services provided by DiscoverX/Eurofins is another useful tool to determine the efficacy, safety, and mechanism of action of small molecules, say the review’s authors. The system is based on over 60 human primary cell-based models of tissue and disease biology, coupled to a reference benchmark database of more than 4,500 reference compound profiles. Bioinformatic tools associated to the system provide the desired insights.
Screening of CRISPR-generated libraries is another tool complementary to the above mentioned ones; according to the review, this approach allows to study either activating (CRISPRa) or inhibiting (CRISPRi) genes using gene editing. CRISPRa techniques are also useful to assess gain of functions and survival of cells under specific conditions (e.g. the presence of the candidate substance), says an article published in the Journal of Human Genetics, while CRISPRi is a more powerful tool than RNA interference (RNAi) libraries in screening for loss of functions and it can be used also to assess synthetic lethality interactions.
Cellular thermal shift assay mass spectrometry (CETSA-MS) is another recent label-free, biophysical assay based on proteomics useful to evaluate target engagement of a candidate molecule. The method invented at the Swedish Karolinska Institute (which founded the startup Pelago Bioscience to exploit it) allows for the direct measure of ligand-induced changes in protein thermal stability both in living cells and tissues (read here more details).
In vitro ligand binding assays
In vitro panel ligand binding assays can be run using pharmacological targets similar to the one of interest or against off-targets known to be associated with adverse side effects, suggested the SLAS Discovery’s review. According to the recent US regulations on abuse potential of new drugs, assays for neuronal systems related to drug abuse potential and transporters might also be included, suggest authors.
Many assay panels are also commercially available to test kinase activities and interactions with small molecules. More complex is the elucidation of the possible interactions with microtubules and the electron transport chain components: the suggestion in this case is to approach it through routine assessment in advanced mitochondrial cell health assays. Label and label-free technologies can also be used to determine binding with small molecules; the review provide a wide list of both types of methodologies to be used to better assess target deconvolution.
Computational off-target prediction methods
The interaction of small molecules with biological receptors can also be assessed using computational analysis of structure-activity relationships (SAR, or QSAR in the case of quantitative measures). The exercise can be approached under a target-centric vision focused on the target protein, or a ligand-centric vision focused on the small molecule. According to the review, automated molecular docking tools such as Glide and AutoDock are useful to assist the computational lead identification and optimization. The second one is a free tool developed by the Scripps Research Institute. The review also list a wide group of commonly cheminformatics tools and target based off-target prediction methods.
Toxicogenomics and other omics technologies
Toxic substances up- or down-regulating the transcription process may be identified using key mRNA as “gene signature” to be compared against reference toxicological databases. According to the AbbVie’s specialists, this approach is useful to estimate drug safety through the induction of several substances typically produced by the liver, e.g drug-metabolizing enzymes, phase II enzymes, and transporters.
Useful database in this instance are DrugMatrix, released by the U.S. National Toxicology Program, and TG-GATEs. The first one contains results of thousands of highly controlled and standardised toxicological experiments on rats or primary rat hepatocytes, including large-scale gene expression data. The second database is the result of the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation system jointly run by Japanese’s National Institute of Biomedical Innovation and National Institute of Health Sciences and 15 pharmaceutical companies. TP-GATEs also focuses on data originated in rats and primary cultured hepatocytes of rats and humans following exposure to 150 compounds. The second phase of the project (TGP2, Toxicogenomics Informatics Project) discovered over 30 different safety biomarkers, which have been incorporated into TG-GATEs.
In vitro transcriptomics on single cells or co-cultures is also a useful tool, according to the review, in order to directly test human tissues, especially when their availability is limited or when small quantities of the drug candidates are available. Furthermore, toxicogenomics often pairs with other types of -omics technologies – i.e. proteomics, metabolomics, and lipidomics – in order to obtain a more comprehensive picture of the cell environment and of the impact of small molecules.
Microfluidic organs-on-a-chip
A rapidly emerging technology to facilitate toxicity evaluation are microfluidic “organs-on-a-chip”, which for example can be organised in series to test different types of tissues or cellular systems. Examples cited by the review are liver- and gut-on-a-chip, but labs are continuously producing new systems reproducing a wide range of tissues and biological processes (e.g. inflammation or tumoral tissues), mimicking both health and disease conditions. Small molecules are fluxed through the nano-channels of the devices, with or without presence of other substances such as cells’ nutrients or metabolites. Organoid models are more complex types of organs-on-a-chip containing different types of cells to reproduce as close as possible the situation of the real organ under exam. According to AbbVie’s scientists, a better qualification of this approach is still needed for a wider application in the pharmaceutical field, where it may result useful for the identification of mechanisms of toxicity and early screening for liabilities not easily or well tested by other in vitro methods.
The emerging technologies
Chemogenomics, high-throughput (HTS) and high-content screening (HCS) are other techniques useful to expedite the identification of good candidate molecules. Toxicology in the 21st Century (Tox21), for example, is a collaboration between four different US federal agencies which makes available different AI-tools for data analysis, and the possibility to access and visualize the Tox21 quantitative high-throughput screening (qHTS) 10K library data, that can also be integrated with other publicly available data.
Toxicity Forecaster (ToxCast) is another high-throughput screening tool created by the US’s Environmental Protection Agency (EPA) which contains data for approximately1,800 chemicals from a broad range of sources, screened in more than 700 high-throughput assay endpoints covering a range of high-level cell responses.
High-content screening (HCS) is based on the combination of molecular tools with a pool of automated imaging and visualisation techniques (e.g automated digital microscopy and flow cytometry) and aims to quantitatively analyse images from high-throughput screenings in order to identify particular patterns, such as for example the spatial distribution of targets and individual cell or organelle morphology.
New integrated approach to drug discovery and development
If the above is a description of single instruments available to drug developers, viewed under the perspective of scientists working for a big “traditional” pharmaceutical company, many are the innovative companies deeply investing on artificial intelligence to create a completely new business model for the R&D process.
Historically, newcos born from IT giants, such as Verily Life Sciences (Google) or IBM Watson have pioneered the new era. Watson for Drug Discovery (WDD), for example, is a cognitive platform collecting more that 25 million Medline abstracts, plus over a million scientific articles and 4 million patents: a knowledge that can be integrated with private data and searched using seven different modules to better define the query. Leader pharma companies such as Pfizer and Novartis have already closed collaboration agreements with WDD to improve the possibility of success of their screening strategies.
New not-conventional players include for example Cota Healthcare, a platform for personalised medicine specialised in the stratification of patients on the base real-world evidence generated insights of personal and clinical data. Its Cota Nodal Address technology allows to classify patients and their disease using a concise digital code, thus enabling a detailed analysis of outcomes, toxicities, practice patterns, and cost.
Deep convolutional neural network is the core technology used by Atomwise to predict the characteristics of the binding between small molecules and target proteins using the proprietary algorithm AtomNet. The core database contains millions of affinity data and measures and it can be searched to screen for potency, selectivity and polypharmacology, as well as against off-target toxicity. The company has already in place more than 60 collaborations, including multinational companies such Pfizer, AbbVie, Merck or Bayer.
The complete development cycle typical of a pharmaceutical business, from lab to market, is the focus of a network of companies comprising Insilico Medicine, Juvencescence.ai and Netramark. The first one is focused on the identification of new therapeutic approaches to treat ageing and related diseases (including identification of novel biomarkers), and also provides services for drug discovery and development. Its AI-based infrastructure uses a complex mix of -omics and chemical databases screened by artificial intelligence tools to identify the “signature” for diseases.
Juvenescence.ai is responsible for the development of drug candidates for ageing identified by the artificial intelligence used by Insilico Medicine, while Netramark uses it proprietary algorithm NetraPharma to match therapeutics with patients’ sub-populations and identify new possibilities for molecules that have already failed development.
The concept of “Interrogative Biology” is at the core of Berg Health’s business model, upon which the development of new personalised interventions starts from the biology of the single patient to be treated. In this case too, artificial intelligence is used to analyse data from biological samples acquired using wide set of technologies (e.g. high performance mass spectrometry, genomics, proteomics, lipidomics and metabolomics, oxidative stress, mitochondrial function, ATP production, etc). Each sample can generate million of data, that are matched with phenotype and clinical information on the single patient. Machine- and deep-learning algorithms are used to identify the therapeutic approach most suited to each clinical case, as well as the biomarkers that may be used. Berg Health is collaborating with several pharmaceutical companies, among which are AstraZeneca, Sanofi-Pasteur and Becton Dickinson.