For most commercially available next-generation sequencing platforms, the clonal amplification of each DNA fragment in the library by methods such as bridge amplification or emulsion PCR is necessary in order to generate sufficient copies of sequencing template. The fragment libraries are obtained by annealing platform-specific adaptors to fragments generated from a DNA source of interest, such as genomic DNA, double-stranded cDNA, and PCR amplicons. The presence of adapter sequences enables selective clonal amplification of the library molecules. Therefore, no bacterial cloning step is required to amplify the genomic fragment in a bacterial intermediate, as is performed in traditional sequencing approaches. Furthermore, the adapter sequence also contains a docking site for the platform-specific sequencing primers.
Typically, a conventional DNA library construction protocol consists of 4 steps:
- Fragmentation of DNA
- End repair of fragmented DNA
- Ligation of adapter sequences (not for single-molecule sequencing applications)
- Optional library amplification
Currently 4 different methods are commonly used to generate fragmented genomic DNA: enzymatic digestion, sonication, nebulization, and hydrodynamic shearing. All methods have been used in library construction, but each has specific advantages and limitations. Endonucleolytic digestion is easy and fast, but it is often difficult to accurately control the fragment length distribution. Furthermore, this method tends to introduce biases regarding the representation of genomic DNA. The other three techniques employ physical methods to introduce double strand breaks into DNA, which are believed to occur randomly resulting in an unbiased representation of the DNA in the library. The resulting DNA fragment size distribution can be controlled by agarose gel electrophoresis or automated DNA analysis.
Following fragmentation, the DNA sections must be repaired to generate blunt-ended, 5'-phosphorylated DNA ends compatible with the sequencing platform-specific adapter ligation strategy. The library generation efficiency is directly dependent on the efficiency and accuracy of these DNA end-repair steps.
The end-repair mix converts 5'- and 3'-protruding ends to 5'-phosphorylated blunt-ended DNA. In most cases the end repair is accomplished by exploiting the 5'–3' polymerase and the 3'–5' exonuclease activities of T4 DNA polymerase, while T4 Polynucleotide Kinase ensures the 5'-phoshorylation of the blunt-ended DNA fragments, preparing these fragments for subsequent adapter ligation.
Depending on the sequencing platform used, the blunt-ended DNA fragments can either directly be used for adapter-ligation, or need the addition of a single A overhang at the 3' ends of the DNA fragments to facilitate subsequent ligation of platform-specific adapters with compatible single T overhangs. Typically, this A-addition step is catalyzed by Klenow Fragment (minus 3' to 5' exonuclease) or other polymerases with terminal transferase activity.
T4 DNA ligase then adds the double-stranded adapters to the end-repaired library fragments, followed by reaction cleanup and DNA size selection to remove free library adapters and adapter dimers. The methods for size selection include agarose gel isolation, the use of magnetic beads, or advanced column-based purification methods. Adapter-dimers that can occur during the ligation and will subsequently be co-amplified with the adapter-ligated library fragments must be depleted from the libraries prior to sequencing, as they reduce the capacity of the sequencing platform for real library fragments and reduce sequencing quality. Some sequencing platforms require a narrow distribution of library fragments for optimal results, which in many cases can only be achieved by excising the respective fragment section from the gel. This can also serve to deplete adapter–dimers.
After this step, DNA fragment libraries should be qualified and quantified. Depending on the concentration and adapter design of the sequencing library, it can either be directly diluted and used for sequencing, or subjected to optional library amplification. In the library amplification step, high-fidelity DNA polymerases are employed to either generate the entire adapter sequence needed for subsequent clonal amplification and binding of sequencing primers, with overlapping PCR primers, and/or to produce higher yields of the DNA libraries. Optimal library amplification requires DNA polymerase with high fidelity and minimal sequence bias.
For assessment of the library quality, see Library QC for NGS.
To enable efficient use of the sequencing capacity, sequencing libraries generated from different samples can be pooled and sequenced in the same sequencing run. This is enabled by ligation DNA fragments to adaptors with characteristic barcodes, i.e., short stretches of nucleotide sequences that are distinct for each sample.
Other methods are available to streamline library construction. One such novel method uses in vitro transposition by a transposase/DNA complex to simultaneously fragment and tag DNA in a single-tube reaction. A complete sequencing library can subsequently be constructed by limited rounds of the PCR amplification of such tagged DNA fragments, limiting handling steps and saving time. However, libraries generated using in vitro transposition may show higher sequence bias compared to those generated using conventional methods.