The 100,000 Genomes Project: What is it, What is its Status, and What Comes Next?

  • This Transformative Project was Announced by the UK Government in December 2012 
  • 100,000th Whole Genome was Sequenced Only Five Years Later
  • Rare Diseases and Cancers Continue to be the Focus
  • The European Union Seeks 1,000,000 Whole Genome Sequences by 2022

This past April 25th, DNA Day was recognized by scientists around the world to commemorate the day in 1953 when the double-stranded helical structure of DNA was proposed by Watson & Crick in their famous publication in Nature titled Genetic Implications of the Structure of Deoxyribonucleic Acid. During the preparation of my blog for DNA Day 2019, I read The Double Helix: A Personal Account of the Discovery of the Structure of DNA by Watson, which I highly recommend, especially to anyone who works with DNA. I also read DNA: The Story of the Genetic Revolution, also by Watson, who provides an excellent educational narrative of how elucidation of DNA has impacted basic and applied health-related sciences.

This blog on the 100,000 Genomes Project was inspired by the latter book’s chapters on how the evolution of faster, better, and cheaper genomic sequencing has led to “The Age of Personalized Medicine,” which has been defined as follows:

Personalized medicine is the tailoring of medical treatment to the individual characteristics of each patient. The approach relies on scientific breakthroughs in our understanding of how a person’s unique molecular and genetic profile makes them susceptible to certain diseases. This same research is increasing our ability to predict which medical treatments will be safe and effective for each patient, and which of these will not be.

Taken from and free to use.

Continuing improvements in “faster, better, cheaper” DNA sequencing methodologies are essential to personalized medicine. In striking contrast to the estimated cost of between $500 million to $1 billion over 13 years to sequence the first human genome, which was completed in 2003, massively parallelized (aka high-throughput) genome sequencing can now be achieved in factory-like facilities for an estimated cost of around $1,000 per genome, as pictured here. 

In regard to low cost, Dante Labs—an Italian startup company— announced its launch of the first commercial long-read whole genome sequencing (WGS), with 30X coverage, for $999 using Oxford Nanopore technology, in April 2019. In regard to speed, a new ultra-fast, ultra-high-throughput sequencer from MGI Tech Co., Ltd. in China can complete WGS for up to 60 human genomes in a single day, according to a recent news article. For more information about advances in DNA sequencing, interested readers may refer to a review article published in 2017 by Shendure et al. titled DNA sequencing at 40: past, present and future.

What Is the 100,000 Genomes Project?

In 2012, the UK government announced the establishment of the 100,000 Genomes Project, which would revolutionize patient diagnosis and treatment by offering the prospect of personalized treatment to many patients. The project is run by Genomics England, a company that is wholly owned by the Department of Health, but is delivered through 13 Genomic Medicine Centers.

The key goals of the project are to promote genetic research in order to benefit patients, and to support the development of the UK genomics industry. WGS is carried out by a single provider, Illumina, at a purpose-built facility in Cambridgeshire. The program currently has two medical targets: rare diseases and cancer. It is designed as “a transformative project and to make interpretation of our DNA sequence sit alongside conventional diagnostic procedures and tests to inform the appropriate clinical pathways for treatment of disease,” according to an article in The Biomedical Scientist by Gerry Thomas, Professor of Molecular Pathology.

Thomas added that, for rare diseases, only a blood sample is required—usually from the patient and his or her parents. Most of these families are already aware that an inherited genetic component to their disease is a likely cause. For cancer patients, a paired sample of blood and tissue is required, the latter of which posed technical issues that are discussed in the next section.

What is the Status of the 100,000 Genomes Project?

The feasibility of WGS for cancer patients in the clinic has focused on the use of high-quality nucleic acids extracted from fresh-frozen tissue (FF) specimens collected within a research infrastructure. However, FF specimens are not routinely collected, as formalin-fixed, paraffin-embedded (FFPE) material is the specimen of choice for histopathological diagnosis, as pictured here. DNA extracted from FFPE specimens is somewhat degraded due to fragmentation, DNA crosslinks, abasic sites, and deamination leading to C>T mutation artifacts, which impede downstream sequencing analysis.

Robbe et al. addressed these technical issues in a pilot feasibility study that evaluated WGS data sets obtained from 156 genomes from 52 matched FF tumor, FFPE tumor, and peripheral blood samples routinely collected as part of the diagnostic process. The differences observed between FF and FFPE sequence data allowed for the development of new methods to optimize the quality of FFPE-derived WGS data, and allow the acquisition of genome-wide data for all patients with cancer, including those for whom only FFPE material is available.

I find it quite remarkable that  on December 5, 2018, after only six years since its conception, Genomics England announced that the Project had reached its goal of sequencing 100,000 whole genomes from patients. More importantly, this effort already realized the following benefits:

“The 100,000 Genomes Project has delivered life-changing results for patients, with one in four participants with rare diseases receiving a diagnosis for the first time, and providing potential actionable findings in up to half of cancer patients where there is an opportunity to take part in a clinical trial or to receive a targeted therapy.” 

Among the many video-stories shared by patients and their families available on the Genomics England website, I selected an example of a rare disease and a cancer as representative of these “life-changing results” delivered from the 1000,000 Genomes Project.

Jessica’s Story: Jessica, aged 4, was afflicted with a rare condition of unknown cause. She and her parents gave a small sample of blood and their genomes were sequenced. Every genome is compared to the reference human genome sequence, which is used as a guide, and Jessica’s genome had 6.4 million single-nucleotide differences (aka variants). Because Jessica’s condition was rare, and not shared with either parent, the next bioinformatic step looked for rare variants not present in the genome of either parent, but predicted to cause a change in an encoded protein. Out of 67 such variants, only one was located in a gene listed in PanelApp as being linked to symptoms similar to the ones Jessica was experiencing. To me, this is the equivalent of systematically finding a “genetic needle in a genomic haystack”!

The name of the gene in question is SLC2A1, and in Jessica’s genome, a single frameshift-mutation prevents expression of the encoded protein from that copy of the gene, meaning that she doesn’t have enough in her body. The SLC2A1 gene makes the glucose transporter protein type 1 (GLUT1), which is involved in moving glucose, the brain's main energy source, across the blood-brain barrier. Two normal copies of this gene are needed for this protein to transport enough glucose to fuel the brain. Mistakes in the SLC2A1 gene can cause ‘GLUT1 deficiency syndrome’, which is Jessica’s diagnosis. Research has shown that in some patients who have GLUT1 deficiency syndrome, a special low-carbohydrate (ketogenic) diet can help reduce the number of seizures they experience by providing an alternative energy source for the brain.

The Lloyd Sisters’ Story: Three sisters, Mary, Sandra, and Kerry Lloyd, all developed breast cancer within 15 months of each other. Their late mother had also been affected, as well as two other female relatives over three generations. They had already undergone genetic testing for changes in the BRCA1 and BRCA2 genes, which can be a cause of breast cancer, but these tests were negative. The sisters each gave a small amount of blood to have had their genomes sequenced so that bioinformaticians would be able to analyze the sequences and determine the nature of their breast cancers.

BRCA1 tumor suppressor protein RING domain, in complex with BARD1 protein, as discussed elsewhere.

Breast cancer affects approximately 1 in 8 women in the UK, and about 5-10% of breast cancers are thought to be hereditary, caused by abnormal genes passed from parent to child. This phenomenon is known as ‘familial breast cancer’. Families that have familial breast cancer may include men with breast cancer, and are sometimes afflicted by other cancers as well, such as ovarian cancer or prostate cancer. In these families, cancers may develop at a younger age than usual. Most inherited cases of breast cancer are associated with specific changes in BRCA1 and BRCA2, genes that are present in females and males. 

The function of the BRCA genes is to repair cell damage and keep breast cells growing normally. When these genes contain abnormalities or mutations, they don’t function normally, and breast cancer risk increases. However, not all familial breast cancer is due to mutations in BRCA. Changes in other genes have also been found to play a role in causing familial breast cancer. By taking part in the 100,000 Genomes Project, the Lloyd sisters are contributing to research. Their de-identified data, together with data from other participants, is available to researchers through a secure database. There is great interest in understanding the genomic factors connected to breast cancer risk, which could include changes in the way known risk genes (like BRCA1 and BRCA2) are regulated, or switched on and off.

What Is Next for the 100,000 Genomes Project?

On March 4, 2019, Genomics England’s Chief Scientist and interim Chief Executive Professor Mark Caulfield reported on what is next for 100,000 Genomes Project participants:

  • The first priority is to get reports back to those who have not yet received a result.
  • The second priority is to revisit the genomes for people who have not yet gotten an answer, to see if new knowledge or new ways to analyze a genome will be able to find answers.
  • Panels of genes based on worldwide literature will be expanded to include recently added information, and other parts of the genome. The virtue of having a person’s whole genome is the ability to reanalyze it with new knowledge to get new answers for the participants.
  • The current goal is to analyze the genome of all participants for the first time, and return the results by July 2019. Scientists and doctors will then study the reports and consider whether there is adequate reliability to give the information back to the participants, as the clinicians on the receiving end are those caring for patients.

Other forward-looking information I found on the Genomics England website and thought worth mentioning, is related to various partnerships exemplified by the following: 

  • Genomics England signed a strategic research and development agreement with the Qatar Genome Program. This includes standardization of genomic strategies for healthcare implementation; the evaluation of new technologies for WGS; the cross-analysis of both national datasets; and the exchange of expertise related to educational programs.
  • American pharmaceutical companies Alexion and BioMarin, both members of Genomics England’s Discovery Forum, have identified previously undiagnosed patients with rare life-threatening kidney and neurological diseases, respectively. Nephronophthisis (NPHP) is responsible for 15% of cases of childhood end-stage renal failure. Neuronal ceroid lipofuscinoses 2 (CLN2) (aka Batten Disease) symptoms typically emerge in children aged 2-4, who have a life expectancy of around 10 years. 
  • Genomics England announced the successful completion of the first phase of its collaborations with Inivata in the UK and Thermo Fisher Scientific in the US to investigate the use of liquid biopsies in cancer, which I have previously blogged about. This is part of a pilot project to assess the suitability of circulating tumor DNA (ctDNA) samples. The results of the study showed that 200 plasma samples from the 100,000 Genomes Project across all cancer types were of a high quality and produced reliable results.

1,000,000+ Genomes Initiative by the European Union 

If you think that the 100,000 Genomes Project by the UK is impressive, then the European Union’s (EU) plan to obtain 1,000,000+ human genome sequences by 2022 will strike you as amazing. Here’s a brief synopsis of this initiative, according to statements on the European Commission policy website. 

Since its launch on Digital Day 2018, the “1+ Million Genomes” initiative has grown into a cooperative effort involving all 20 signatory Member States and Norway. These countries meet on a regular basis in order to ensure that the aim of the declaration—having at least 1 million sequenced genomes available in the EU by 2022—is achieved. This includes linking access to existing and future genomic databases across the EU, as well as providing a sufficient scale for new clinically impactful associations in research. The expected benefits to EU citizens are summarized as follows:

“Genomics has the potential to revolutionize healthcare in many ways. It could lead to the development of more targeted personalized medicines, therapies and interventions. It could also enable better diagnostics, boost prevention and make more efficient use of scarce resources. From cancer, to rear diseases, brain related diseases or prevention—Genomics can greatly improve various health conditions of EU citizens. Equally important, Genomics has also the potential to improve the effectiveness, accessibility, sustainability and resilience of health systems in the EU.”

Closing Comments

It is important to note that all participants in the 100,000 Genomes Project are patients of the UK National Health Service (NHS), which is the publicly funded national healthcare system for England, and one of the four National Health Services of the UK. It is the largest single-payer healthcare system in the world. Primarily funded through the government and overseen by the Department of Health and Social Care, NHS England provides healthcare to all legal English residents, with most services free at the point of use. 

As usual, your comments are welcomed.


After this blog was written, there was a report that California state legislators have introduced a bill to increase WGS access to pediatric illnesses. The bill is referred to as "Ending the Diagnostic Odyssey Act." The aim of the bill is to provide access to WGS for "certain undiagnosed children under the Medicaid program, and for other purposes."

The bill is supported by Rady Children's Hospital in San Diego and is also sponsored by Rep. Juan Vargas (D-CA). The belief is that rapid WGS should be available at hospitals nationwide for critically ill infants and children. Rady Children’s Hospital-San Diego has pioneered use of rapid sequencing in the care of pediatric patients.

3 years ago
18 view(s)
Did you like this post?