Young Investigator Felix Bittner on developing an MPSplex assay for missing persons identification
November 11, 2019
Our Young Investigator series allows us to acknowledge rising scientists in the forensic field. We hope that they inspire you with their story, as they are our future!
Continuing our interview series, we have Felix Bittner, DNA Analyst at the International Commission on Missing Persons (ICMP), The Hague, Netherlands. His work focuses on developing a large massively parallel sequencing SNP panel (MPSplex assay) for missing persons identification.
Tell us about your background and how you became interested in forensic science.
I suppose I was the typical 'scientist' kid, always interested in the natural world, especially how things are put together and how they work. Later, I quite stereotypically got a microscope as a present and so on. One day in high school, my mother gave me a magazine with an article about the 'body farm,' talk about stereotypes, and I got hooked on the idea of using science to assist the justice system. Really, from that point on, I became fixated on forensic science; to my teachers' dismay, not much outside of that goal interested me. Long story short, I eventually found this program for Biotechnology with a Forensic Science specialization in the Netherlands and immediately applied there. I did not even really realize I had to learn Dutch. Anyway, this led me to several internships in Germany and the UK and a keen interest in forensic DNA analysis. Going full circle, the supervisor of my first ever internship recommended I apply for the Master's degree in Forensic Science at the University of Amsterdam, which for me culminated in an internship at the International Commission on Missing Persons. It's been some time since my graduation now, but somehow, I am still here at ICMP. Thanks to people I met along the way and who put a lot of trust in me.
Can you provide a summary of the project you are working on?
The project I work on is a collaboration between QIAGEN and the ICMP. It's the development of a massively parallel sequencing panel with a large number of SNPs and micro-haplotypes designed explicitly for the identification of missing persons. We try to overcome two main challenges with this panel: working with highly degraded bone samples and the identification through reference samples of distant relatives, e.g., a single first cousin. The former is achieved by designing very short amplicons thanks to the use of SNPs, while the latter is achieved by using tri-allelic SNPs, micro-haplotypes, and a large number of loci (~1300). Specifically, I focus on a lot of the bioinformatic aspects of the development. Since I do not come from a specifically bioinformatics background, that meant teaching myself R, to begin with, and now dipping my toes into Python. It is quite amazing how much Google and a healthy thirst for problem-solving can get you with programming. That being said, everyone on the project does a fair bit of everything. So, I also spend a lot of time in the lab. I think the main thing I bring to the project, though, is a good basis in kinship statistics and a curiosity that fills in where that base has holes. On the one hand, I calculated the statistics on the cases we applied this to so far. On the other hand, it means coming up with smart, or not so smart, ways of incorporating SNPs into our existing matching algorithm considering the linkage between them.
Please describe your typical day in the lab.
My typical day in the lab is mostly spent outside of the lab; I am sure some people can relate to this. Most of my time is spent sifting through the tons of data coming off an NGS instrument. Mostly using R, but I am also teaching myself a bit of Python. A lot of the time, I attempt to answer burning questions my colleagues have about the most recent set of experiments. "Felix how many loci can we call at 31 pg input?", "How many false genotype calls did we see in this set of 30 samples?" - things like that. Besides, I spend a lot of time reading through papers describing algorithms for kinship statistics and trying to come up with an approach we may implement in our matching software. Like many others in Forensic Science, I have to dedicate some time to standard STR casework. It's an exciting juggle sometimes.
What do you find most interesting about your project? Have you seen any surprising results?
I think the most exciting part about the project is the advantage of the unique molecular indices of the QIAseq chemistry bring to the table. Before any PCR step, unique sequences are ligated to the original molecules, so that later you know how many original molecules you observed and how many PCR copies of each of them you sequenced. Of course, they help to minimize PCR/sequencing error via a consensus between PCR copies. Still, in a forensic context, they are fascinating when it comes to the interpretation of data. You can approach the calling thresholds for homozygote loci, for example, from a completely different point of view. If you are observing unique molecules instead of relative signals like RFUs or read numbers, which contain PCR duplicates, you can set thresholds independently from PCR/sequencing parameters, a challenge other NGS systems are currently struggling with. We are also looking at the possibility of pooling FASTQ data from different replicates bioinformatically with the help of the UMIs to increase significantly the amount of data we get from replicates. As far as surprising goes, two things: the robustness of the protocol, no matter what we throw at it, it delivers. And secondly, how many problems can be solved by very straight-forward bioinformatics. I think our field has neglected education in this area far too long because CE-STRs did not require it that much. Now all these 'big' problems can reasonably quickly be addressed by a fundamental understanding of computers. I suppose the good news is that there is a lot of growth to be expected from that!
What are the benefits of your project?
Already we could give answers to several families of the missing who have been waiting for years and years. None of their cases would have been resolved with traditional methods. It's hard to put it into words, but I am sure everyone can relate to how impactful this is to them but also society in general. That aside, I think our project is also helping to push sequencing more into the mainstream of forensics. There already is a fair amount of knowledge exchange and interest with several other laboratories, and I hope the more laboratories get comfortable with NGS, the quicker the adoption of it by the rest of the field. I think one of the specifics this project brings to this exchange of knowledge is the use of UMIs. We believe it could be a game-changer in the long run.
What are the significant challenges faced while working on your project, and how do you overcome them?
I think it is fair to say that the biggest challenge is the sheer amount of data generated by NGS. Forget the processing of the data, even coming up with the exact question that we want to be answered can be a real challenge if you can look at everything in a million different ways. One of the ways to cope, I think is to pick your battles, if you have ~1300 markers and 50 do not perform quite optimally, do you try to troubleshoot those markers for hours on end or do you throw them out? In the end, you have to be practical, and it is easy to forget that sometimes. Another way to approach it that I like anyway is to keep the end goal in mind. Ultimately, the assay has to generate data that we can use in kinship statistics. For example, we are currently thinking about building the whole system on a probabilistic matching algorithm. This would spare us from having to figure out exactly where we should draw the line to call a homozygote. Of course, that would also open up a whole host of other challenges:
Which QIAGEN products do you use, and what do you like about the products?
This project is, of course, based on the QIAseq chemistry, we also use the GeneReader for sequencing, and finally, we use the CLC Genomics Workbench to process most of the sequencing data. I think I already made clear how much I personally, and we as a team, like the UMIs. But the other point that I touched on is how robust the QIAseq workflow seems. I think everyone will admit that during research, there are always points where you think this experiment cannot possibly have worked out. But so far, we still get positively surprised.
Outside of forensic science, what are your hobbies?
That's not an easy question. I suppose I have no specific hobby at the moment. But I am very interested in classical music, not least because my (soon to be) wife is a classical pianist. Unfortunately, or maybe thankfully, I have no talent whatsoever, so I am only allowed to listen. Aside from that, I am very interested in tailoring and classic menswear, and when I get the chance, I enjoy a good book.
Find out more about NGS in human identification and forensics here.
I suppose I was the typical 'scientist' kid, always interested in the natural world, especially how things are put together and how they work. Later, I quite stereotypically got a microscope as a present and so on. One day in high school, my mother gave me a magazine with an article about the 'body farm,' talk about stereotypes, and I got hooked on the idea of using science to assist the justice system. Really, from that point on, I became fixated on forensic science; to my teachers' dismay, not much outside of that goal interested me. Long story short, I eventually found this program for Biotechnology with a Forensic Science specialization in the Netherlands and immediately applied there. I did not even really realize I had to learn Dutch. Anyway, this led me to several internships in Germany and the UK and a keen interest in forensic DNA analysis. Going full circle, the supervisor of my first ever internship recommended I apply for the Master's degree in Forensic Science at the University of Amsterdam, which for me culminated in an internship at the International Commission on Missing Persons. It's been some time since my graduation now, but somehow, I am still here at ICMP. Thanks to people I met along the way and who put a lot of trust in me.
Can you provide a summary of the project you are working on?
The project I work on is a collaboration between QIAGEN and the ICMP. It's the development of a massively parallel sequencing panel with a large number of SNPs and micro-haplotypes designed explicitly for the identification of missing persons. We try to overcome two main challenges with this panel: working with highly degraded bone samples and the identification through reference samples of distant relatives, e.g., a single first cousin. The former is achieved by designing very short amplicons thanks to the use of SNPs, while the latter is achieved by using tri-allelic SNPs, micro-haplotypes, and a large number of loci (~1300). Specifically, I focus on a lot of the bioinformatic aspects of the development. Since I do not come from a specifically bioinformatics background, that meant teaching myself R, to begin with, and now dipping my toes into Python. It is quite amazing how much Google and a healthy thirst for problem-solving can get you with programming. That being said, everyone on the project does a fair bit of everything. So, I also spend a lot of time in the lab. I think the main thing I bring to the project, though, is a good basis in kinship statistics and a curiosity that fills in where that base has holes. On the one hand, I calculated the statistics on the cases we applied this to so far. On the other hand, it means coming up with smart, or not so smart, ways of incorporating SNPs into our existing matching algorithm considering the linkage between them.
Please describe your typical day in the lab.
My typical day in the lab is mostly spent outside of the lab; I am sure some people can relate to this. Most of my time is spent sifting through the tons of data coming off an NGS instrument. Mostly using R, but I am also teaching myself a bit of Python. A lot of the time, I attempt to answer burning questions my colleagues have about the most recent set of experiments. "Felix how many loci can we call at 31 pg input?", "How many false genotype calls did we see in this set of 30 samples?" - things like that. Besides, I spend a lot of time reading through papers describing algorithms for kinship statistics and trying to come up with an approach we may implement in our matching software. Like many others in Forensic Science, I have to dedicate some time to standard STR casework. It's an exciting juggle sometimes.
What do you find most interesting about your project? Have you seen any surprising results?
I think the most exciting part about the project is the advantage of the unique molecular indices of the QIAseq chemistry bring to the table. Before any PCR step, unique sequences are ligated to the original molecules, so that later you know how many original molecules you observed and how many PCR copies of each of them you sequenced. Of course, they help to minimize PCR/sequencing error via a consensus between PCR copies. Still, in a forensic context, they are fascinating when it comes to the interpretation of data. You can approach the calling thresholds for homozygote loci, for example, from a completely different point of view. If you are observing unique molecules instead of relative signals like RFUs or read numbers, which contain PCR duplicates, you can set thresholds independently from PCR/sequencing parameters, a challenge other NGS systems are currently struggling with. We are also looking at the possibility of pooling FASTQ data from different replicates bioinformatically with the help of the UMIs to increase significantly the amount of data we get from replicates. As far as surprising goes, two things: the robustness of the protocol, no matter what we throw at it, it delivers. And secondly, how many problems can be solved by very straight-forward bioinformatics. I think our field has neglected education in this area far too long because CE-STRs did not require it that much. Now all these 'big' problems can reasonably quickly be addressed by a fundamental understanding of computers. I suppose the good news is that there is a lot of growth to be expected from that!
What are the benefits of your project?
Already we could give answers to several families of the missing who have been waiting for years and years. None of their cases would have been resolved with traditional methods. It's hard to put it into words, but I am sure everyone can relate to how impactful this is to them but also society in general. That aside, I think our project is also helping to push sequencing more into the mainstream of forensics. There already is a fair amount of knowledge exchange and interest with several other laboratories, and I hope the more laboratories get comfortable with NGS, the quicker the adoption of it by the rest of the field. I think one of the specifics this project brings to this exchange of knowledge is the use of UMIs. We believe it could be a game-changer in the long run.
What are the significant challenges faced while working on your project, and how do you overcome them?
I think it is fair to say that the biggest challenge is the sheer amount of data generated by NGS. Forget the processing of the data, even coming up with the exact question that we want to be answered can be a real challenge if you can look at everything in a million different ways. One of the ways to cope, I think is to pick your battles, if you have ~1300 markers and 50 do not perform quite optimally, do you try to troubleshoot those markers for hours on end or do you throw them out? In the end, you have to be practical, and it is easy to forget that sometimes. Another way to approach it that I like anyway is to keep the end goal in mind. Ultimately, the assay has to generate data that we can use in kinship statistics. For example, we are currently thinking about building the whole system on a probabilistic matching algorithm. This would spare us from having to figure out exactly where we should draw the line to call a homozygote. Of course, that would also open up a whole host of other challenges:
Which QIAGEN products do you use, and what do you like about the products?
This project is, of course, based on the QIAseq chemistry, we also use the GeneReader for sequencing, and finally, we use the CLC Genomics Workbench to process most of the sequencing data. I think I already made clear how much I personally, and we as a team, like the UMIs. But the other point that I touched on is how robust the QIAseq workflow seems. I think everyone will admit that during research, there are always points where you think this experiment cannot possibly have worked out. But so far, we still get positively surprised.
Outside of forensic science, what are your hobbies?
That's not an easy question. I suppose I have no specific hobby at the moment. But I am very interested in classical music, not least because my (soon to be) wife is a classical pianist. Unfortunately, or maybe thankfully, I have no talent whatsoever, so I am only allowed to listen. Aside from that, I am very interested in tailoring and classic menswear, and when I get the chance, I enjoy a good book.
Find out more about NGS in human identification and forensics here.
Want to be featured in the next Investigator Blog?
Fill out and submit your application to Young.Investigator@qiagen.com today!