NGS

Sequencing Singapore: Project SG10K and a vision for nationwide genomic analysis

Liu Jianjun, PhD plans to sequence the DNA of every Singaporean. This project is vital because it aims to create personalized medical treatments for the diverse Asian population, which has been underrepresented in genetic research. But decoding the information is only one part of the job. Crunching and storing the information is the other. Here, the use of Big Data is key.

The genesis of a groundbreaking genomic project can be traced back to late 2016 during a retreat at the Genomic Institute of Singapore (GIS), where scientists contemplated “the big picture, big science, and the future.”  

Jianjun Liu, PhD , part of this visionary team, recollects their daring idea: What if they sequenced the genome of not just a few but of many Singaporeans? “Last year, 10,000 people still sounded really big,” laughs Liu. This ambitious thought marked the beginning of a remarkable journey in genomics. 

Fast forward seven months, and Liu’s team, comprising 13 research associates and post-doctoral students, had already decoded 3,000 genomes, with the aim of completing 10,000 by mid-2018. This initiative, dubbed SG10K, focused on whole-genome sequencing of Singaporeans.  

“The aim is to characterize genetic variations in the Singaporean population, create a whole genome sequencing (WGS) reference panel for accurate genotype attribution and generate a large control dataset for WGS-based genetic association study of diseases,” says Liu. 

This rapid advancement illustrates the tremendous strides made in genomics since the completion of the Human Genome Project in 2003, which took 13 years and $3 billion to decode the first human genome. Contrastingly, just under 15 years later, such decoding has become routine, thanks to advancements in technology.  

The GIS basement, equipped with cutting-edge machinery, processes approximately 300 genomes weekly, each costing around $1,000 – a figure expected to drop to $100 soon. Dr. Liu envisages an even grander scale for the future, aiming to sequence the genomes of every Singaporean, totaling 3.5 million individuals.

DNA Modification
Genetic variation plays an important role in a variety of human diseases and quantitative traits. Many genetic findings have shown population-specific characteristics, highlighting the importance of population diversity in human genetic studies.
Globally a lot of genetic information has been collected, but mostly from Caucasians. Information on Asian populations has been lacking.
Jianjun Liu, PhD, Deputy Executive Director, Genome Institute of Singapore (GIS) and Professor at the Yong Loo Lin School of Medicine, National University of Singapore

From marine biology to genetics 

Liu wears tailored suits, but in the style of most Singaporeans, skips a tie. He switches easily from intense concentration to easygoing banter. When he talks about his favorite topic – genetics – it can be hard to get a word in. “Thirty years ago I wanted to be a marine biologist, then I tried my hand at quantitative genetics,” says Liu in a voice which, despite all his years abroad, still echoes his native China. But studying fruit flies was not all that satisfying. “That turned out to be a little too… indirect,” Liu jokes. 

Liu’s dream stands a good chance of becoming reality. Over the past decades, Singapore, which is about the same size as Lake Geneva, Switzerland, has made science and technology a major pillar of its national growth strategy. The GIS, of which Dr. Liu is deputy executive director, is the national flagship program for genomic sciences in the city-state, and is equipped with state-of-the-art research infrastructure.  

Over 300 scientists, trainees and staff work in its headquarters in Singapore’s Biopolis, a hypermodern science park designed by Iraqi-born architect Zaha Hadid. It consists of 13 research centers, shops, restaurants, a pub, a childcare center and even a typical Singaporean food court in which food stalls dish out cheap but delicious street food. 
 
A multi-institutional effort to enable big data analytics and integrative genomics in Singapore, named the Centre for Big Data and Integrative Genomics (c-BIG), is at the core of GIS. It enables high throughput sequencing, molecular cytogenetics, bioinformatics, single-cell genomics, high throughput / content screening and genome engineering.  

SG10K is one of its most prominent projects. Its heart beats in the basement of a sleek beige building aptly called Genome. 

It seems like a cheerful place to work: laughter can be heard in the background as staff call out to “JJ” as their boss is affectionately known. “Can I join you?” jokes a researcher when he pretends to photobomb Liu.

Dr Liu
Asia is a booming market for pharmaceuticals and personalized consumer products. But to tailor products and medicines to the Asian market, the actual genetic makeup of the region’s population needs to be better understood. “Globally a lot of genetic information has been collected, but mostly from Caucasians. Information on Asian populations has been lacking,” says Jianjun Liu, PhD, Deputy Executive Director at the Genome Institute of Singapore (GIS).
The aim is to characterize genetic variations in the Singaporean population, create a whole genome sequencing (WGS) reference panel for accurate genotype attribution and generate a large control dataset for WGS-based genetic association study of diseases.
Jianjun Liu, PhD, Deputy Executive Director, Genome Institute of Singapore (GIS) and Professor at the Yong Loo Lin School of Medicine, National University of Singapore

Working with a diverse genetic pool 

Asia is a booming market for pharmaceuticals and personalized consumer products. But to tailor products and medicines to the Asian market, the actual genetic makeup of the region’s population needs to be better understood. “Globally a lot of genetic information has been collected, but mostly from Caucasians. Information on Asian populations has been lacking,” says Liu.  

Genetic variation plays an important role in a variety of human diseases and quantitative traits. Many genetic findings have shown population-specific characteristics, highlighting the importance of population diversity in human genetic studies. 

If Liu has no doubt that genomics will revolutionize medicine, Big Pharma companies agree, based on their large-scale investment in pharmacological genomics. 
 
The aim is to understand which variants play a role in diseases and to build a personalized treatment plan through genetic profiling. Another goal is to establish who will benefit from a drug, or, on the contrary, who may have an adverse reaction. 

“Singapore has an extremely diverse genetic pool,” says Liu. Long the center of one of the world’s busiest trading routes, immigrants from all over Asia have made Singapore their home. Today, the population consists of three major ethnic groups: Chinese, Malay, and Indian, which together represent over 80 percent of the genetic diversity of Asian population.  

The tropical island nation’s colorful history, combined with a business-oriented, forward-thinking government, a first-class health-care system and the capacity to quickly implement policy, makes it an ideal place to study Asia’s genetic diversity. 

Dr-Liu-and-Team
Project SG10K will generate about 2 – 3 petabyte or 2,000 – 3,000 terabytes of data. That extremely large data set will then be analyzed to reveal patterns, trends and associations, with a special focus on detection of genetic variants – a herculean task, considering that a typical genome of a healthy individual harbors some three to five million variants, and previous studies identified about 88 million sites in the human genome that vary among people.
Ten thousand is just the beginning. After that, we hope to sequence the genomes of 250,000 Singaporeans, then every single citizen, 3.5 million people
Jianjun Liu, PhD, Deputy Executive Director, Genome Institute of Singapore (GIS) and Professor at the Yong Loo Lin School of Medicine, National University of Singapore

Investments in culture and supercomputing 

Singapore has poured hundreds of millions into supercomputing. Listening to Liu, one gets the feeling that it was a smart move. The SG10K will generate about 2 – 3 petabyte or 2,000 – 3,000 terabytes of data.  

That extremely large data set will then be analyzed to reveal patterns, trends and associations, with a special focus on detection of genetic variants – a herculean task, considering that a typical genome of a healthy individual harbors some three to five million variants, and previous studies identified about 88 million sites in the human genome that vary among people. 
 
Currently, advanced bioinformatics solutions including QIAGEN’s CLC Genomic Workbench, Ingenuity Pathways Analysis and the Human Gene Mutation Database help Dr. Liu’s team and other groups at the GIS to cope with this challenge.  

“Big Data is the driver of progress,” says Liu. Data generation will eventually become of secondary interest, as data processing, analysis and interpretation increasingly prove to be a major bottleneck for genomic research.  

“The future lies in computational genomics. The field is moving forward fast, and is multidimensional. Machine learning and Artificial Intelligence will be the next big thing,” Liu says. 

Big Data projects like SG10K will facilitate genome-wide association studies and non-hypothesis, genome-wide control samples, says Liu, who is relentlessly optimistic about the next years in his field.  

“Moving forward, the question will be ‘How can I manipulate the genes that make us sick?’” Mankind will have powerful tools to edit and rearrange genes, Liu predicts.

A multi-institutional effort to enable big data analytics and integrative genomics in Singapore, named the Centre for Big Data and Integrative Genomics (c-BIG), is at the core of GIS. It enables high throughput sequencing, molecular cytogenetics, bioinformatics, single-cell genomics, high throughput / content screening and genome engineering.
Thirty years ago I wanted to be a marine biologist, then I tried my hand at quantitative genetics.
Jianjun Liu, PhD, Deputy Executive Director, Genome Institute of Singapore (GIS) and Professor at the Yong Loo Lin School of Medicine, National University of Singapore

Ideas about the future

“Can we understand all genetic diseases? Yes. Will we figure out a genetic therapy for each disease? Probably. Will we be able to live out a lifespan of 120 years if we eliminate all diseases? I don’t really know, but I think in 20 to 30 years we will be able to edit genomes,” he says. 
 
He envisions that in five to 10 years the genome of every newborn baby will be sequenced at birth. The information will be stored in secure cloud environments and selectively shared with healthcare professionals upon the patient’s consent in a similar way that other healthcare records are shared today. 

Eventually, this will be done on an international level, once questions about data security and privacy have been resolved. “Of course this is very powerful data. We need to deeply discuss the ethical implications and come up with a plan for data control and security,” Liu says. 
 
In the meantime, Liu and his team are trying to tackle more prosaic problems. Experts at GIS are developing new algorithms and testing new tools for sharing data. The idea is to keep the data centralized. “For obvious reasons medical information is very sensitive. We are nervous to give access,” says Liu.  

A larger team of scientists is working on a system which would allow the health sector to upload their patient data anonymously into a matrix and then play around with it “like in a sandbox.” As one of the key drivers of future progress, a solution to store the massive amounts of data produced by Big Data has to be found. To this end, Singapore is now exploring various cloud-storage solutions. 

The path to genomic medicine and its implementation in routine medical care via personalized therapies is one that is being fortified by results gained through population-based sequencing initiatives.  

These include not only SG10K, but also a number of other initiatives such as the Chinese Million Genomes endeavor which aims to sequence the genomes of one million people, the US-based Precision Medicines Initiative which is targeting the same number of patients, whole genome population studies in the Netherlands, Qatar, Turkey and Japan, as well as projects such as the International Cancer Genome Consortium that coordinates large-scale cancer genome studies. 

Professor Jianjun Liu is currently the Deputy Executive Director at the Genome Institute of Singapore (GIS) and Professor at the Yong Loo Lin School of Medicine, National University of Singapore. Liu did his undergraduate study at the University of Science and Technology of China, got his Master's degree from the Institute of Oceanology at the Chinese Academy of Science, and was awarded a PhD degree in quantitative genetics at Duke University. After finishing his postdoctoral training on the genetics of psychiatric disorders at the Columbia University, Liu joined GIS faculty in 2002.

January 2018 (Updated 2024)