Contact Us

Artificial intelligence applied to mRNA vaccines

Artificial intelligence applied to mRNA vaccines
Loading... 311 view(s)
Artificial intelligence applied to mRNA vaccines


Artificial intelligence (AI) is a broad term that encompasses a variety of approaches that enable electronic machines (aka computers and robots) to perform tasks that would normally require human intelligence. AI can “learn” from input data, identify patterns, and make predictions. AI can also “reason” by using logic and deduction to solve problems or make decisions. Moreover, AI can “adapt” based on new information-input.  

As reviewed elsewhere (McCaffrey 2022; Kim et al. 2023), AI can be applied to the design and development of new mRNA vaccines as a means of greatly facilitating some of the important steps in going from “virus-to-vial.” Examples of such AI-enabled steps outlined in this blog are the following: 

  1. Identification and ranking of protein antigen candidates, 
  2. Prediction of their immunogenicity, 
  3. Design and ranking of mRNA candidates that encode these antigens, and  
  4. Design of lipid nanoparticle (LNP) formulations of these mRNA candidates. 

Because of the highly mathematical and statistical nature of the AI methodology for these applications in vaccinology, the discussion here focuses on basic strategy and outcomes rather than the AI details, which can be  read in the linked publications.

1. Identification and Ranking Antigen Candidates

Antigen identification, which is the first step in vaccine development, can utilize deep learning systems based on artificial neural networks to analyze vast amounts of genomic and proteomic data in concert with various other algorithms. 

An example of this was reported by Rawal et al. (2021), who devised a new AI system to discover and analyze vaccine targets leading to the design of a multi-epitope vaccine for Trypanosoma cruzi. About 6–7 million people worldwide are estimated to be infected with T. cruzi, the parasite that causes Chagas disease, for which there is no vaccine and is one of the World Health Organization’s Neglected Tropical Diseases 

In brief, Rawal et al. started with an analysis of the genomic and proteomic datasets of T. cruzi and other pathogens to identify possible vaccine candidates (PVCs). To do this, an integrated pipeline (termed Vax-ELAN) comprised of various reported algorithms was used for sequential “filtering” of predicted protein properties. This comprehensive in silico screening included parameters for cellular secretion, membrane-surface exposure, stability, antigenicity, etc. 

The T. cruzi proteins were also screened against the Database of Essential Genes to access essentiality for pathogen survival and the Virulence Factor Database. Ideally, the vaccine targets should not have significant sequence similarity with human proteins; therefore, an NIH protein BLAST search was used to filter out those T. cruzi proteins having >35% identity with human proteins. The net result was the identification of a shortlist of 8 PVCs. 

All adaptive immune responses are mediated by B cells via B-cell receptor-antigen binding, and T cells via major histocompatibility complex (MHC) class I and II glycoprotein presentation of antigens to T cells. Therefore, Rawal et al. used available bioinformatic tools to find and rank B-cell epitopes and T-cell epitopes of the 8 PVCs. From the several thousand epitopes analyzed, 8 top epitopes were identified for each of 3 target-cell types, namely, B-cells, cytotoxic/MHC-I T-cells, and helper/MHC-II T-cells, resulting in a list of 24 epitopes of interest. 

To assemble a single construct from these 24 epitopes, each of which had 10-20 amino acids, short peptide linkers were inserted between each epitope. Attachment of β-defensin provided an adjuvant and the final assembly, termed V1, was analyzed in silico for antigenicity, allergenicity, solubility, and higher order structure. Further analysis of V1 is outlined in the next section.

2. Prediction of Immunogenicity

Rawal et al. used C-ImmSim for computational simulation of the immune response of V1 that was derived as discussed in the previous section. C-ImmSim is the C-language version of IMMSIM, the IMMune system SIMulator, a program written by the astrophysicist Phil E. Seiden together with the immunologist Franco Celada. This model simulates the mechanisms making up the adaptive immune humoral and cellular response to any antigen by incorporating the principal "core facts" of current immunological knowledge, such as MHC restriction, clonal selection by antigen affinity, thymic education of T cells, and antigen processing.  

The results obtained for V1 indicated an increased surge in the induction of secondary and tertiary immune responses. At the first dose, a high surge of IgM and IgG1 antibodies was predicted. However, these titters increased exponentially with the second and third dose. Furthermore, an increase in active B-cell, cytotoxic T-cell, and helper T-cells was predicted. Based on these promising predictions, Rawal et al. say they are investigating an mRNA version of V1 for Chagas disease. 

3. Design and Ranking of Antigen-Encoding mRNA Candidates

The preceding sections outlined use of AI to identify viral antigen candidates for in silico evaluation of their immunogenicity. The next step for actual mRNA vaccine development is conversion of a candidate antigen’s amino acid sequence into a codon-optimized mRNA sequence. This is commonly done with OptimumGene™, for which there are hundreds of citations in Google Scholar. However, recent work (Leppek et al. 2022) has shown that highly structured "superfolder" mRNAs can be designed to improve both stability and expression.  

While an mRNA design algorithm that optimizes both codon usage and structural stability would be expected to further enhance protein expression, Zhang et al. (2023) recognized that this poses a seemingly insurmountable problem. Using the SARS-CoV-2 virus’s antigenic spike protein as an example, its 1,273 amino acids encoded by 3,822 nucleotides leads to 2.4 × 10632 possible mRNA sequences due to codon degeneracy. If these sequences were enumerated by conventional codon optimization with the codon adaptation index (CAI) and stability optimization by calculation of minimum-free-energy (MFE), 10616 billion years would be required!  

Zhang et al. solved this problem by adapting methodology in computational linguistics, reasoning that “the optimal mRNA among the vast space of candidates is analogous to finding the most likely sentence among many similar-sounding alternatives.” Remarkably, applying the resultant algorithm, termed LinearDesign, to optimization of the spike protein mRNA coding sequence takes only 11 min. The source code for LinearDesign is available to all parties on GitHub and Zenodo, and is free for academic and research use.  

For comparative purposes, Zhang et al. used LinearDesign to obtain 7 spike protein mRNA sequences (A–G) with roughly comparable values of MFE or CAI, while an additional  benchmark mRNA sequence (H) was designed with OptimumGene™. 

Groups of mice were injected with two doses (2-week interval) of mRNA sequences A–H in LNPs for evaluation of humoral and cellular immune responses. LinearDesign mRNAs A-G all elicited robust antibody responses, whereas benchmark mRNA sequence H showed very limited ability to induce antibodies. Notably, sequences A–D, which are more optimal than sequences E–G, elicited 57- to 128-fold increases in anti-spike IgG antibody titers and 9- to 20-fold increases in neutralizing antibody titers, compared to benchmark sequence H.

4. Design of LNP Formulations of mRNA Candidates

LNP formulations are widely used for mRNA vaccine research, clinical trials, and commercialization, as well as mRNA therapeutics. To identify promising LNP formulations, most studies screen dozens to hundreds of compositions containing ionizable lipids synthesized using a single type of chemistry. However, this approach technique leaves the ionizable lipids synthesized through multi-step chemistries underexplored.  

According to  Lewis et al. (2023), this gap in the repertoire of structures is significant because it affects the screening of analogs that are structurally similar to SM-102 and ALC-0315, the ionizable lipids used in clinically approved mRNA/LNP COVID-19 vaccines. To address this issue, they applied LightGMB, which is short for light gradient-boosting framework for machine learning (ML), which was developed by Microsoft and is free and open-source. The dual objectives of using LightGMB were to reduce the burden of empirically screening new synthetic ionizable lipids and increase the in silico molecular structure-space by learning from the breadth and diversity of lipids that have already been tested.  

Curation of the relevant literature by Lewis et al. led to a “learning” dataset from 2,332 LNPs from 14 different reports across 10 different labs. They first evaluated the ability of LightGBM to predict LNP potency across this heterogeneous chemistry to achieve a high correlation value, which was found to be the case, as indicated by R2 = 0.94.  After establishing this predictive capacity of the model, they identified the number of carbons in the “tail” portion of an ionizable lipid as the most important factor contributing to transfection efficiency.  

From this finding, the algorithm was used to predict the effect of formulating nanoluciferase mRNA LNPs using tail variants of SM-102 and ALC-0315 on nanoluciferase activity in HEK293T cells, which led to R2 = 0.83. Importantly, this correlation encompasses novel lipids not included within the database used to train the algorithm. Overall, this study demonstrates the potential of ML to accelerate the development of new ionizable lipids by simplifying the screening process. 

More recently, Xu et al. (2024) have reported their ML approach to accelerate LNP development for mRNA delivery, which is trained on a much larger dataset for combinatorial synthesis of ionizable lipid candidates followed by robotic high-throughput screening with TriLink CleanCap® reporter gene EGFP or gene editing Cre recombinase. 


Concluding Comments  

AI can also be applied to “front-end” viral disease epidemiology datasets and “back-end” vaccine safety and efficacy datasets. For an excellent discussion, see the expert review by Wong et al. (2023) in Science titled Leveraging artificial intelligence in the fight against infectious diseases 

The emerging commercial importance of AI for mRNA vaccines and therapeutics is apparent from BioNTech’s 2023 acquisition of InstaDeep Ltd., a leader in AI and ML, to “build capabilities in AI-driven drug discovery and development of next-generation immunotherapies and vaccines to address diseases with high unmet medical need.” 



Your comments are welcomed, as usual. 


Please feel free to share this blog with your colleagues or on social media.