5 January 2018
Dr. Liu Jianjun oversees genome sequencing of 10,000 Singaporeans. He believes, as do many other researchers, that studying the genomes of a population will revolutionize medical science and challenge our view of the world. But decoding is only one part of the job. Crunching and storing the information is the other. Here, the use of Big Data is key.
The undertaking was dreamed up in late 2016 when scientists working at the Genomic Institute of Singapore (GIS) met for a retreat, where they pondered “the big picture, big science and the future,” as Dr. Liu Jianjun puts it. The team came up with a seemingly daring idea: What if they sequenced the genome of not just a few but of many Singaporeans? “Last year, 10,000 people still sounded really big,” laughs Liu.
Seven months later, the team has deciphered 3,000 genomes. They expect to reach their goal of 10,000 sequences in mid-2018. The project is known as SG10K, and entails performing whole-genome sequencing of 10,000 Singaporeans.
Dr. Liu’s team is made up of 13 research associates and post-doctoral students. “The aim is to characterize genetic variations in the Singaporean population, create a whole genome sequencing (WGS) reference panel for accurate genotype attribution and generate a large control dataset for WGS-based genetic association study of diseases,” says Liu.
The project aims to provide genetic information, making clinical and pharmaceutical research within Singapore’s population easier, and promises to add to the study of Asian-centric genetic diseases. The sheer speed at which the project’s scientists have worked showcases the exponential progress genomics has made since the Human Genome Project. When the first complete human genome sequence was published in 2003, it took scientists from six participating countries 13 years to decipher the code, and the project cost about $3 billion.
Not even 15 years later, that kind of decoding is routinely carried out on an industrial scale. The high-tech equipment in the GIS basement currently reads out about 300 genomes per week, each costing about $1,000. “And in the near future, it will be down to $100 and again much faster,” says Liu. This progress allows the scientist to dream big dreams: “Ten thousand is just the beginning. After that, we hope to sequence the genomes of 250,000 Singaporeans, then every single citizen, 3.5 million people.”
Dr. Liu wears tailored suits, but in the style of most Singaporeans, skips a tie. He switches easily from intense concentration to easygoing banter. When he talks about his favorite topic – genetics – it can be hard to get a word in. “Thirty years ago I wanted to be a marine biologist, then I tried my hand at quantitative genetics,” says Liu in a voice which, despite all his years abroad, still echoes his native China. But studying fruit flies was not all that satisfying. “That turned out to be a little too… indirect,” Liu jokes, giving the impression that in fact he found it tiresome in the extreme.
If Dr. Liu has no doubt that genomics will revolutionize medicine, Big Pharma companies agree, based on their large-scale investment in pharmacological genomics.
The aim is to understand which variants play a role in diseases and to build a personalized treatment plan through genetic profiling. Another goal is to establish who will benefit from a drug, or, on the contrary, who may have an adverse reaction.
Big Data projects like SG10K will facilitate genome-wide association studies and non-hypothesis, genome-wide control samples, says Liu, who is relentlessly optimistic about the next years in his field. “Moving forward, the question will be ‘How can I manipulate the genes that make us sick?’” Mankind will have powerful tools to edit and rearrange genes, Liu predicts. “Can we understand all genetic diseases? Yes. Will we figure out a genetic therapy for each disease? Probably. Will we be able to live out a lifespan of 120 years if we eliminate all diseases? I don’t really know, but I think in 20 to 30 years we will be able to edit genomes,” he says.
He envisions that in five to 10 years the genome of every newborn baby will be sequenced at birth. The information will be stored in secure cloud environments and selectively shared with healthcare professionals upon the patient’s consent in a similar way that other healthcare records are shared today. Eventually, this will be done on an international level, once questions about data security and privacy have been resolved. “Of course this is very powerful data. We need to deeply discuss the ethical implications and come up with a plan for data control and security,” Liu says.
In the meantime, Dr. Liu and his team are trying to tackle more prosaic problems. Experts at GIS are developing new algorithms and testing new tools for sharing data. The idea is to keep the data centralized. “For obvious reasons medical information is very sensitive. We are nervous to give access,” says Liu. A larger team of scientists is working on a system which would allow the health sector to upload their patient data anonymously into a matrix and then play around with it “like in a sandbox.” As one of the key drivers of future progress, a solution to store the massive amounts of data produced by Big Data has to be found. To this end, Singapore is now exploring various cloud-storage solutions.
The path to genomic medicine and its implementation in routine medical care via personalized therapies is one that is being fortified by results gained through population-based sequencing initiatives. These include not only SG10K, but also a number of other initiatives such as the Chinese Million Genomes endeavor which aims to sequence the genomes of one million people, the U.S.-based Precision Medicines Initiative which is targeting the same number of patients, whole genome population studies in the Netherlands, Qatar, Turkey and Japan, as well as projects such as the International Cancer Genome Consortium that coordinates large-scale cancer genome studies.