- Sequence Optimization of IVT mRNA Has Benefited COVID-19 Vaccines
- Keys Are 5’ Capping, Codon Optimization, and Modified Bases
- TriLink Technologies Contributes to These Advances
The previous Zone blog (April 27, 2021) titled mRNA Vaccines – A New Era in Vaccinology, emphasized that mRNA sequence optimization is needed to achieve efficient intracellular translation of the encoded protein antigens, as well as to minimize undesired innate immunogenicity due to RNA structural elements. This blog provides some perspective on published strategies for this type of sequence optimization, which is alternatively referred to sequence engineering. This topic has been trending upwards in recent years, as indicated by the linear forecast dashed line shown in this chart reports found by word searching the NIH PubMed database for 2013 – 2020.

In keeping with the intent and scope of individual Zone blogs, short synopses of selected aspects of IVT mRNA sequence optimization will be presented, along with lead references to the original publications or reviews that can be consulted for additional information or technical details. As a final note, this blog is intended to be a timely primer, if you will, for the upcoming May 19, 2021 TriLink webinar on Optimizing Your mRNA Coding Sequence.
Overview of the Biological Relevance of mRNA Structure
Before discussing current strategies for the optimization of IVT mRNA sequence in the context of vaccine prophylaxis or therapy, it is helpful to first consider what we know about the biological function of the various structural regions of mRNA. Depicted here, these regions include 5’ cap structure, 5’ untranslated region (UTR), open reading frame (ORF), 3’ UTR, and poly(A) tail.

Reviews by Pardi et al. (2018) and Linares-Fernández et al. (2019) provide in-depth discussions and references regarding the biological relevance of these mRNA regions, brief synopses of which are provided in the following subsections.
Cap Structure: Eukaryotic RNAs are 5’ capped with a 7-methylguanosine (m7G) connected by a 5’-to-5’ triphosphate bridge to the first nucleotide. These RNAs can have several forms, as depicted here. Cap 0 is important for the recruitment of translational initiation factors, and it prevents degradation of the mRNA. In humans and other higher eukaryotes, the 2’ ribose position of the first cap-proximal nucleotide is methylated to form a Cap 1 structure, and, in ~50% of transcripts, the second cap-proximal nucleotide is 2’ O-methylated to form Cap 2 (see Furuichi and Shatkin).

According to McCaffrey and collaborators, cytoplasmic viruses frequently possess mechanisms to acquire a Cap 1 structure. Many of these viruses are attenuated when their methyltransferases are inactivated, suggesting that cap structure may play an important role in self vs. non-self-recognition. Cap 1 methylation has been shown to modulate binding or activation of innate immune sensors. For example, the binding affinity of IFIT-1 (an interferon Induced protein) for Cap 1 and Cap 2 is much weaker than for 5’ triphosphate or Cap 0 RNAs, and IFIT-1 binding to non-2’-O methylated RNAs competes with the translational initiation factor EIF4E (shown here) to prevent translation.

RIG-I is a cytosolic pattern recognition receptor (PRR) responsible for type-1 interferon (IFN1). RIG-1 is an essential molecule in the innate immune system, as it recognizes cells that have been infected with a virus. Cap 0 and 5’-triphosphate bind RIG-I with similar affinities, while Cap 1 modification abrogates RIG-I signaling. Similarly, Cap 1 prevents detection by MDA5, a RIG-I-like dsRNA helicase that can detect long dsRNA (the genomic RNA of dsRNA viruses), as well as replicative intermediates of both positive and negative sense RNA viruses.
Untranslated Regions: The 5’ UTR is partly comprised by sequence motifs recognized by cellular translation machinery, and it is recognized and scanned by ribosomes. Following recruitment of the ribosome, translation initiation happens at the start codon (underlined) within the Kozak consensus sequence ACCAUGG.

According to Linares-Fernández et al., both the 5’ and 3’ UTRs are important regulators of mRNA decay and translational efficiency, due to RNA-binding proteins. Because of their high stability, the use of globin (α- or β-globin, shown here) UTRs from Xenopus laevis or humans has been the historical standard approach in mRNA vaccination. However, UTR performance is dependent on species, cell type, and cell state. Any mRNA vaccine or mRNA therapeutic therefore has to define which UTR sequences in the targeted cells are most relevant for its strong expression, as discussed below.
Translated Region: An open reading frame (ORF) is the part of a reading frame that has the ability to be translated. An ORF is a continuous stretch of codons that begins with a start codon (usually AUG) and ends at a stop codon (usually UAA, UAG, or UGA). In eukaryotic genes with multiple exons, introns are removed and exons are then joined together after transcription to yield the final mRNA for protein translation. In the context of gene finding, the start-stop definition of an ORF therefore only applies to spliced mRNAs, not genomic DNA, as introns may contain stop codons and/or cause shifts between reading frames. An alternative definition says that an ORF is a sequence that is bounded by stop codons and has a length divisible by three. This more general definition, which is useful in the context of transcriptomics, is the central element of interest for mRNA vaccines and mRNA therapeutics.


Poly(A) Tail: A 1997 review by Colgan and Manley titled Mechanism and regulation of mRNA polyadenylation, states that “a poly(A) tail is found at the 3′ end of nearly every fully processed eukaryotic mRNA, and has been suggested to influence virtually all aspects of mRNA metabolism. Its proposed functions include conferring mRNA stability, promoting an mRNA’s translational efficiency, and having a role in transport of processed mRNA from the nucleus to the cytoplasm.” In the context of mRNA vaccines and mRNA therapeutics, the most important of these, according to Linares-Fernández et al., are mRNA stability and recognition by poly(A)-binding protein, a RNA-binding protein (shown here) that triggers the binding of EIF4 for translation initiation.
Optimization of In Vitro Transcribed (IVT) mRNA Sequence
Cap Structure: A simplified depiction of the workflow for synthesis of IVT mRNA is shown here. This process has technically evolved over many decades, starting with pioneering studies of how bacteriophage (aka phage) infect and replicate in host bacteria. In 1974, the T7 RNA polymerase (RNAP) encoded by the then well-known T7 bacteriophage was purified from T7-infected Escherichia coli B (Niles et al.).

T7 RNAP is extremely promoter-specific, only transcribing DNA downstream of a T7 promoter. The T7 promoter shown here is recognized for binding and initiation of the transcription. The consensus sequences in T7 and related phages (T3, K11, and SP6) are given here. Transcription begins at the asterisk-marked guanine.

T7 RNAP became one of the most widely used enzymes for synthesis of IVT mRNA. Unlike DNA polymerases, T7 RNAP initiates RNA synthesis in the absence of a primer by recognizing its promoter sequence and incorporating the first pair of NTPs at positions +1 and +2 in the template strand. While T7 RNAP can also initiate with short oligonucleotide primers, the results are confounded by various factors, as detailed in a TriLink patent by Hogrefe et al. The inventors circumvented these priming problems by developing CleanCap® technology (depicted here), a method that not only produces IVT mRNA containing >90% of the desired natural Cap 1 structure as detailed by Henderson et al. in Current Protocols, but also contains minimal immunogenic dsRNA, therefore bypassing the need for HPLC purification.

Untranslated and poly(A) Regions: In 2006, Ugur Sahin, who later went on to cofound mRNA vaccine company BioNTech in 2008, led a group of academic researchers in Germany in one of the first systematic investigations on the influence of UTR structure of IVT mRNA translation. These effects are cell-type dependent and, in this study, dendritic cells (DCs) were transfected with IVT mRNA either encoding a reporter protein (enhanced green fluorescent protein, eGFP) or tumor-associated antigens.

Briefly, they identified components located on the 3′ of the coding region that contributed to higher transcript stability and translational efficiency. Using qRT-PCR and eGFP variants to measure transcript amounts and protein yield, researchers concluded that each of the following discoveries independently enhanced RNA stability and translational efficiency: (1.) a poly(A) tail measuring 120 nucleotides (nt) instead of a shorter one (16, 42, or 51 nt), (2.) an unmasked poly(A) tail with a free 3′ end rather than one extended with five unrelated nucleotides (ACUAG), and (3.) two sequential β-globin 3′ untranslated regions cloned head-to-tail between the coding region and the poly(A) tail. Similarly, the density of antigen-specific peptide/MHC complexes (shown here) on the transfected DCs and their potency in stimulating and expanding antigen-specific CD4+ and CD8+ T cells were also increased.
More recently, Suknuntha et al. (2018) evaluated translational efficiency of different mRNA modifications in an effort to improve direct induction of endothelium and blood from pluripotent stem cells (PSCs). Using ARCA-capped, poly(A)120-tailed eGFP pseudouridine (Ψ)-modified mRNA (Ψ-mRNA), which was prepared by IVT with Ψ 5’-triphosphate (shown here) from TriLink, it was found that IVT template containing the 5′ UTR and a single 3′ UTR from β-globin provided maximum protein levels in human PSCs, while IVT template containing artificial double repeats of 3′ UTRs from β-globin yielded the least. Interestingly—and as a cautionary note—these effects did not apply to non-human primate PSCs.

In addition, although point mutations within the 5′ UTR of the β globin gene reduced the rate of transcription, which resulted in thalassemia diseases, the mutations did not interfere with mRNA transport from the nucleus to the cytoplasm, 3′ end processing, or mRNA stability. Suknuntha et al. said these findings indicate that the translational enhancement mechanism interacts on the specific sequence within the 5′ UTR during the transcription process. They found that neither UTR modification had significant benefit on the duration of functional eGFP protein expression.
Translated Region: As mentioned above, the review of IVT mRNA vaccines by Linares-Fernández et al. notes that mRNAs have inherent adjuvant properties, due to their complex interaction with PRRs. This recognition can either be beneficial by activating antigen-presenting cells (APCs), or detrimental by indirectly blocking mRNA translation. To decipher this good/bad duality, the reviewers describe the different innate response mechanisms triggered by mRNA molecules, and how each element from the 5’ cap to the poly(A) tail interferes with innate/adaptive immune responses. Interested readers should consult this 2020 review on “tailoring” IVT mRNA stucture, as this blog will now transition to an exemplary case involving a non-immunogenic protein encoded in IVT mRNA, namely, Cas9 for CRISPR gene editing.
According to McCaffrey and collaborators, an ideal Cas9 mRNA should mimic a fully processed mRNA and not activate innate immune pathways, as activation of these receptors induces inflammation, leads to translational inhibition, and causes mRNA degradation. With a goal to design and produce Cas9 (and other) mRNAs that do not activate (or minimally activate) these RNA-sensing pathways, TriLink’s CleanCap® Cap 1 AG trimer was used with T7 RNAP for co-transcription of Cas9 mRNA, using more than 10 base-modified NTPs (modNTPs). These modNTPs included previously studied compounds (Ψ, 5-methyl cytidine, 2-thio uridine, etc.), new compounds (5-methoxy uridine (5moU, shown here), 5-hydoxymethyl cytidine, etc.), and various combinations of the two groups.
While designing this study of Cas9 mRNA, it was found that, in the context of the luciferase open reading frame (ORF), depletion of uridines in the transcript using synonymous codons increased the luciferase activity for unmodified, Ψ, 5moU-modified mRNAs. In light of these preliminary results and other reports stating that sequence engineering (i.e., the most GC-rich codons, Thess et al.) could improve mRNA activity, uridine-depleted Cas9 ORF was synthesized, as were three additional Cas9 mRNAs containing wild-type bases, Ψ, or 5moU. The activities and immune responses of these were compared to the those of the previously published ARCA Cap 0 Ψ/5meC mRNA. In addition, a portion of each mRNA wash was HPLC purified to remove undesired immunogenic dsRNA contaminant.
Briefly, the activities of these Cas9 mRNAs were tested in cell lines and primary human CD34+ cells. Cytokines were measured in whole blood and mice. These approaches yielded more active and less immunogenic mRNA. Uridine depletion impacted insertion or deletion (indel) activity the most. Specifically, 5-moU uridine depletion induced the best indel frequencies [88% (average ± SD = 79% ± 11%)>
and elicited minimal immune responses, all without needing HPLC purification. It was concluded that this work suggests that uridine-depleted Cas9 mRNA modified with 5-moU (without HPLC purification) or Ψ may be optimal for the broad use of Cas9 both in vitro and in vivo. Based on these findings, TriLink now offers CleanCap® Cas9 mRNA (5moU) as a catalog item.
Codon Optimization
In addition to the above-mentioned comments on codon optimization, interested readers should consult the 2020 review by Linares-Fernández et al. for additional information and original references. According to these reviewers, there have been several approaches towards modifying the ORF sequence, both for the enhancement of translational efficiency, and for the inhibition of a strong innate immunity reaction due to PRR recognition.
The ORF can be modified at the codon level (codon usage bias) to regulate the translation elongation rate, or via the GC content to avoid secondary structures. There are different strategies for codon optimization, including using the more frequent codons for each amino acid, or using codons with higher tRNA abundance. Another strategy is to optimize dicodons by using the best pairs of codons that are optimal together, and a third strategy is to modify the ORF sequence in order to have the same ratio of every codon found naturally in highly expressed proteins of both target species and cells.
Codon optimization increases translation rates and optimal codons near the initiation codon increase elongation rates, providing high mRNA translation levels. By contrast, rare codons provide a low elongation rate that favors ribosome crowding. This impaired elongation allows for the binding of a DEAD-Box RNA helicase to the transcript, which accelerates mRNA decay after 5’ decapping. However, fast elongation rates are not always beneficial. They could prevent an adequate folding of the encoded protein, as shown in a codon optimized firefly luciferase mRNA that lost 50% of its activity. Less frequent codons could provide a lower translation rate and therefore adequate protein folding, which is vital to achieving an adequate antigen conformation. Therefore, depending on the antigen, different codon strategies can be used.
With mRNA vaccines based on linear epitopes, optimization of all codons will be interesting, because complex antigens may require slower translation rates to fold critical protein domains. In any case, the use of rare codons should be avoided in both strategies.
Furthermore, the GC content can impact protein expression in cellular differentiation, which in turn impacts translation that depends on the cell differentiation stage. Thus, GC content in mRNA vaccines could be important for targeting immune cells that depend on cell status (for instance, monocytes vs macrophages).
Lastly, mRNA secondary structures can play an important role in mRNA translation. For instance, highly stable secondary structures and hairpin loops should be avoided. These structures can impede ribosome entry, scanning, and elongation, and they can be recognized as PAMPs by the innate immune system.
Online Codon Optimization Videos and Tools
As an addendum to codon optimization discussed in the preceding section, the Zone found two online resources. The first is an instructional YouTube video by Benchling, a company that offers a variety of cloud-based products and services for life sciences R&D. This 52-minute video, titled Codon Optimizations: Enhance Your Sequence Design and Protein Expression Workflows, provides information about GC content and uridine depletion (beginning at the 14-minute mark).
The second resource can be perused at this link to a Google Scholar search on “free online codon optimization tools.”
Concluding Comments
As highlighted above, the increased use and growing interest in IVT mRNA for prophylactic vaccines and therapies led to the evolution of traditional IVT mRNA synthesis to a much improved methodology that the Zone considers as “IVT mRNA 2.0” synthetic strategies. The distinction between the traditional 1.0 and current improved 2.0 methodologies has been possible due to improved 5’ capping for increased yield and purity, incorporation of chemically modified bases via corresponding modNTPs, and ORF sequence optimization by uridine depletion and related codon optimization approaches.

Given the successful rapid development of IVT mRNA vaccines against COVID-19 using these 2.0 features, the Zone expects that future IVT mRNA-based vaccines and therapeutics will undoubtedly follow at a fast pace. Furthermore, additional R&D will likely yield IVT mRNA 3.0 technology enabled by one or more transformative inventions. These are indeed exciting times.
Your comments are welcomed, as usual.
Please feel free to share this blog with your colleagues or on social media.
PS: Remember to mark your calendar for the May 19, 2021 TriLink webinar on Optimizing Your mRNA Coding Sequence.
