Next-Generation Sequencing (NGS)
Protocols & Applications

Next-generation sequencing key visual
This section describes the processes involved in and applications of next-generation sequencing
  • Main Image Navi
Following the completion of the human genome project, the high demand for low-cost sequencing has given rise to a number of high-throughput, next-generation sequencing (NGS) technologies. These new sequencing platforms allow high-throughput sequencing for a wide range of applications such as:
  • Whole genome sequencing as de novo or resequencing
  • Targeted resequencing
  • Transcriptome profiling
  • Microbiome research
  • Gene regulation studies
NGS sequencers
DNA sequencing
DNA sample QC for NGS
Library preparation
Library QC for NGS
Metagenomics
RNA sequencing
References

NGS sequencers

Next-generation sequencing instruments are a heterogeneous group of machines with regard to throughput, read-length, accuracy, cost per run, cost per megabase, initial costs, size, and technology.

In terms of size and initial costs, instruments can easily be grouped into smaller instruments, so-called “bench-top sequencers” and high-throughput instruments.

Bench-top sequencers enable any laboratory to perform its own sequencing applications, comparable with real-time PCR. These instruments are also used for more clinically oriented applications in combination with target enrichment, where selected target genes are analyzed in great depth, enabling the detection of rare mutants, or detection of mutants in a heterogeneous sample, such as cancer samples. Currently, the throughput of these instruments is in the range of 10 Mb to 7.5 Gb, but is increasing steadily with the continuous improvements on hardware, software, and reagents.

High-throughput sequencers are well suited for large, genome-wide studies, with capacities of up to 600 Gb per run. Some such platforms with high-throughput and accuracy are associated with relatively short read-lengths, which may be an issue with highly repetitive sequence elements or de novo sequencing of unknown genomes. Conversely, there are instruments with higher read-lengths (up to 2500 bp), but significantly lower accuracy and capacity (90 Mb) and instruments in between (~800 bp, 700 Mb).

Therefore, the application determines the instrument that is best suited.

A new approach is the so-called “nanopore sequencing”. Here, a DNA strand is processed through a synthetic or protein nanopore and changes in the electric current allows identification of the base passing the pore. This will theoretically allow sequencing of a complete chromosome in one step, without the need to generate a new DNA strand.

Back to top

DNA sequencing

The workflow for next-generation DNA sequencing is as follows:

  • DNA sample preparation
  • Library construction and validation
  • Massive parallel clonal amplification of library molecules
  • Sequencing

Back to top

DNA sample QC for NGS

First, it is necessary to evaluate the quality of the genomic DNA (integrity and purity).

Gel electrophoresis

The integrity and size of genomic DNA can be checked by regular or pulse-field gel electrophoresis (PFGE) using an agarose gel. Regular gel electrophoresis is not highly accurate, since large DNA molecules migrating through a gel will essentially move together in a size-independent manner. However, it will provide sufficient information in terms of integrity (size range) and purity (RNA contamination runs as a diffuse smear at the bottom of the gel). So, it is still one of the most popular methods for accessing genomic DNA quality.

Note: RNA contamination can lead to overestimation of DNA concentration and may inhibit some downstream steps. When RNA contamination is evident, treat the sample with DNase-free RNase I.

Spectrophotometry

The ratio of the readings at 260 nm and 280 nm (A260/A280) on a spectrophotometer provides an estimate of DNA purity with respect to contaminants that absorb UV light, such as protein. Pure DNA has an A260/A280 ratio of 1.7–1.9.

Note: For accurate A260/A280 values, measure absorbance in slightly alkaline buffer (e.g., 10 mM TrisCl, pH 7.5).

The second step is to measure the concentration of genomic DNA.

Spectrophotometry 

DNA concentration can be determined by measuring the absorbance at 260 nm in a spectrophotometer. The Nanodrop instrument is becoming widely used, owing to its low sample volume (1 µl) and convenience (no cuvettes required). To ensure reliability, readings should be between 0.1 and 1.0.

Note: The absorbance measurements cannot discriminate between DNA and RNA and RNA contamination can lead to overestimation of DNA concentration. However, the A260/A280 ratios are different with pure RNA reading ~2.0 and pure DNA reading ~1.8. Therefore, a reading of, for example, 1.95 can suggest RNA contamination.

Note: Phenol has an absorbance maximum of 270–275 nm, which is close to that of DNA. Phenol contamination mimics both higher yields and higher purity, due to an upward shift in the A260 value.

Fluorometry 

Fluorometry allows specific and sensitive measurement of DNA concentration by use of the fluorochromes. In addition to Hoechst 33258, which shows increased emission at 458 nm when bound to DNA, more sensitive fluorochromes, such as PicoGreen dye, are now used. PicoGreen dye-based assays are up to 10,000-fold more sensitive than UV absorbance detection and at least 400-fold more sensitive than assays that use the Hoechst 33258 dye. Unlike UV absorbance, PicoGreen assays are highly selective for dsDNA over RNA and ssDNA.

DNA standards and samples are mixed with fluorochrome and measured on a fluorometry instrument. The sample measurements are then compared to the standards to determine DNA concentration.

Real-time PCR

Real-time PCR assays can be used to assess quantity and quality of DNA samples. Multiplex PCR assays that use primer sets that amplify fragments of different size at multiple loci can provide an effective quality control for identifying damage or fragmentation. These assays specifically measure PCR-amplifiable DNA molecules, which are those suitable for next-generation sequencing reactions. Therefore, real-time PCR is better suited for predicting the utility of a DNA sample for NGS than the conventional methods mentioned above, which often lack the power or overestimate the amount of amplifiable DNA present in compromised samples.

Back to top

Library preparation

For most commercially available next-generation sequencing platforms, the clonal amplification of each DNA fragment in the library by methods such as bridge amplification or emulsion PCR is necessary in order to generate sufficient copies of sequencing template. The fragment libraries are obtained by annealing platform-specific adaptors to fragments generated from a DNA source of interest, such as genomic DNA, double-stranded cDNA, and PCR amplicons. The presence of adapter sequences enables selective clonal amplification of the library molecules. Therefore, no bacterial cloning step is required to amplify the genomic fragment in a bacterial intermediate, as is performed in traditional sequencing approaches. Furthermore, the adapter sequence also contains a docking site for the platform-specific sequencing primers.

Typically, a conventional DNA library construction protocol consists of 4 steps:

  • Fragmentation of DNA
  • End repair of fragmented DNA
  • Ligation of adapter sequences (not for single-molecule sequencing applications)
  • Optional library amplification

Currently 4 different methods are commonly used to generate fragmented genomic DNA: enzymatic digestion, sonication, nebulization, and hydrodynamic shearing. All methods have been used in library construction, but each has specific advantages and limitations. Endonucleolytic digestion is easy and fast, but it is often difficult to accurately control the fragment length distribution. Furthermore, this method tends to introduce biases regarding the representation of genomic DNA. The other three techniques employ physical methods to introduce double strand breaks into DNA, which are believed to occur randomly resulting in an unbiased representation of the DNA in the library. The resulting DNA fragment size distribution can be controlled by agarose gel electrophoresis or automated DNA analysis.

Following fragmentation, the DNA sections must be repaired to generate blunt-ended, 5'-phosphorylated DNA ends compatible with the sequencing platform-specific adapter ligation strategy. The library generation efficiency is directly dependent on the efficiency and accuracy of these DNA end-repair steps.

The end-repair mix converts 5'- and 3'-protruding ends to 5'-phosphorylated blunt-ended DNA. In most cases the end repair is accomplished by exploiting the 5'–3' polymerase and the 3'–5' exonuclease activities of T4 DNA polymerase, while T4 Polynucleotide Kinase ensures the 5'-phoshorylation of the blunt-ended DNA fragments, preparing these fragments for subsequent adapter ligation.

Depending on the sequencing platform used, the blunt-ended DNA fragments can either directly be used for adapter-ligation, or need the addition of a single A overhang at the 3' ends of the DNA fragments to facilitate subsequent ligation of platform-specific adapters with compatible single T overhangs. Typically, this A-addition step is catalyzed by Klenow Fragment (minus 3' to 5' exonuclease) or other polymerases with terminal transferase activity.

T4 DNA ligase then adds the double-stranded adapters to the end-repaired library fragments, followed by reaction cleanup and DNA size selection to remove free library adapters and adapter dimers. The methods for size selection include agarose gel isolation, the use of magnetic beads, or advanced column-based purification methods. Adapter-dimers that can occur during the ligation and will subsequently be co-amplified with the adapter-ligated library fragments must be depleted from the libraries prior to sequencing, as they reduce the capacity of the sequencing platform for real library fragments and reduce sequencing quality. Some sequencing platforms require a narrow distribution of library fragments for optimal results, which in many cases can only be achieved by excising the respective fragment section from the gel. This can also serve to deplete adapter–dimers.

After this step, DNA fragment libraries should be qualified and quantified. Depending on the concentration and adapter design of the sequencing library, it can either be directly diluted and used for sequencing, or subjected to optional library amplification. In the library amplification step, high-fidelity DNA polymerases are employed to either generate the entire adapter sequence needed for subsequent clonal amplification and binding of sequencing primers, with overlapping PCR primers, and/or to produce higher yields of the DNA libraries. Optimal library amplification requires DNA polymerase with high fidelity and minimal sequence bias.

For assessment of the library quality, see Library QC for NGS.

To enable efficient use of the sequencing capacity, sequencing libraries generated from different samples can be pooled and sequenced in the same sequencing run. This is enabled by ligation DNA fragments to adaptors with characteristic barcodes, i.e., short stretches of nucleotide sequences that are distinct for each sample.

Other methods are available to streamline library construction. One such novel method uses in vitro transposition by a transposase/DNA complex to simultaneously fragment and tag DNA in a single-tube reaction. A complete sequencing library can subsequently be constructed by limited rounds of the PCR amplification of such tagged DNA fragments, limiting handling steps and saving time. However, libraries generated using in vitro transposition may show higher sequence bias compared to those generated using conventional methods.

Back to top

Library QC for NGS

A high-quality library is the key to successful NGS. Library construction includes complex steps, such as fragmenting the sample, repairing ends, adenylation of ends, ligation of adapters, and amplifying the library. These steps may vary depending on different platforms and library types. Monitoring of each step is highly recommended, including checking sizes after sample fragmentation, and a size and concentration check after ligation of adapters. Library validation serves as the final library quality control step, which analyzes the library size and quantity.

Assessment of library size

Agarose and PAGE gel electrophoresis are traditional methods of assessing size and can be time-consuming.

In recent years, microfluidics-based electrophoresis or capillary electrophoresis have become more popular for ascertaining size and concentration. Ready-to-use chip or gel cartridges are user friendly, omitting the need for gel pouring. They have much higher throughput and require far less hands-on time. In addition, they are more sensitive (low detection limit) and are fully automated for data acquisition and digital data output. These machines detect size and concentration simultaneously.

Determination of library quantity
Spectrophotometry and fluorometry

See Spectrophotometry and Fluorometry.

Electrophoresis instrument

As mentioned above, microfluidics-based electrophoresis or capillary electrophoresis provides quantification data in addition to size information. However, one limitation to electrophoresis, spectrophotometry, and fluorometry is that all 3 methods measure the total nucleic acid concentrations, not just molecules with adapters added.

Real-time PCR

The presence of adapter sequences at both ends of the library molecules enables the amplification of millions of individual DNA molecules in parallel PCR amplification step (emulsion PCR or bridge PCR). On some instruments, emulsion PCR is performed to amplify a single DNA molecule to millions of copies of the same sequence all attached to a single bead. With another platform, bridge PCR amplification converts a single DNA molecule into a cluster with many copies of the same sequence. Therefore, the amplifiable molecules, which are appended with adapter sequences at both ends, are the ones that determine the template-to-beads ratio in emulsion PCR or optimal cluster generated by bridge PCR.

Accurate quantification of amplifiable library molecules is essential for ensuring quality reads and efficient data generation. Underestimation of amplifiable library molecules leads to mixed signals and non-resolvable data; conversely, overestimation results in poor yield of template-carrying beads or clusters and reduced usage of sequencing capacity.

Real-time PCR can specifically quantify DNA molecules with adapters at both ends, and therefore provides highly accurate quantification of amplifiable library molecules. The high sensitivity of real-time PCR allows quantification of libraries with very low concentration, even below the detection threshold of conventional methods. Thus, further amplification of the library is minimized and reduces potential bias.

Digital PCR for absolute quantification

Digital PCR provides absolute quantification of NGS libraries with no need for a standard curve. A limiting dilution of library is made across a large number of separate PCR reactions; therefore, most of the reactions have no templates and yield negative results. A single, positive PCR reaction at the endpoint is counted as one individual template molecule. In counting all the positive PCR reactions, the total absolute number of library molecules can be derived. The major advantages of digital PCR include:

  • Single molecule sensitivity 
  • Independence from variations in PCR amplification efficiency, since successful amplification is counted as one molecule, independent of the endpoint amount of product.

However, very specific equipment is required and is relatively expensive, so this technique is not widely used for library quantification yet.

Back to top

Metagenomics

One application of DNA sequencing is the field of metagenomics, the culture-independent study of genetic material recovered directly from environmental samples. Metagenomics describes the functional and sequence analysis of the collective microbial genomes contained in an environmental sample. The term is derived from ‘meta’ (in this case meaning an overarching understanding of genetic diversity) and ‘genomics’ (the comprehensive analysis of an organism's genetic material).

While not a new discipline, metagenomic applications have experienced a huge boost due to the new possibilities that exist with next-generation sequencing technologies.

Estimations suggest that only 1% of all microorganisms are cultivable, therefore metagenomic research may dramatically broaden our knowledge of environments.

For many years, the term “metagenomics” was only connected with the analysis of environmental samples, for example the analysis of DNA isolated from extreme habitats to identify new biocatalysts for industrial applications. However the dramatic increase in throughput, together with the decreases in cost and time has considerably broadened this field to new applications.

Metagenomics can be divided into several areas, including:

  • Pathogenomics/infection genomics
  • Microbiome analysis
  • Environmental metagenomics

Note: This is not an exhaustive list and researchers may choose other criteria.

Pathogenomics/infection genomics is related to diagnostics and is the identification of unknown pathogens from a symptomatic patient. This is often a challenging process since the number of microbes may be very low (~1–10 cells/ml blood).

Conversely, in microbiome analysis, there is a high amount of microorganisms, e.g., from oral or fecal swabs. Here, the aim is to analyze the composition of the community. Given that a human body consists of only 1% human cells and 99% microbial cells, microbiome analysis has a significant potential for future diagnostic applications. See the “Human Microbiome Project (www.hmpdacc.org) for more detail.

In environmental metagenomics, the focus — in addition to the classical search for new biocatalysts — is the investigation and characterization of habitats.

 Principally, there are two different approaches:

  • Whole genome analysis: every DNA present is sequenced
  • 16S profiling: only 16S rRNA DNA is sequenced

The first approach is simply sequencing every DNA present in the sample. This gives the most complete picture of the microorganisms present and may, for example, identify new enzymes/enzyme classes, as well as antibiotic resistances. On the other hand, it requires higher sequencing capacities resulting in lower throughput and higher costs than the second approach. Further methods are in development.

In metagenomic applications, typically sequencing instruments giving higher read-lengths are necessary because there is usually no reference sequence available. Also for 16S rRNA profiling, a sequencing system is required whose read-length spans the whole region (see also NGS sequencers).

Back to top

RNA sequencing

RNA sequencing (RNA-seq) is a method of investigating the transcriptome of an organism using deep-sequencing techniques. The RNA content of a sample is directly sequenced after appropriate library construction, providing a rich data set for analysis. The high level of sensitivity and resolution provided by this technique makes it a valuable tool for investigating the entire transcriptional landscape. The quantitative nature of the data and the high dynamic range of the sequencing technology enables gene expression analysis with a high sensitivity. The single-base resolution of the data provides information on single nucleotide polymorphisms (SNPs), alternative splicing, exon/intron boundaries, untranslated regions, and other elements. Additionally, prior knowledge of the reference sequence is not required to perform RNA-seq, allowing for de novo transcriptome analysis and detection of novel variants and mutations. RNA-seq is an extremely powerful and revolutionary way to investigate transcriptomes, but requires care in order to achieve the highest quality of data.

Factors to consider in RNA-seq

The first factor to consider is enrichment of the sample. Total RNA generally contains only a very small percentage of coding or functional RNA; ribosomal RNA (rRNA: up to 80–90% of the total RNA), and to a lesser degree transfer RNA (tRNA), make up the majority of the RNA in a sample. In order not to use 80–90% of one’s sequencing capacity on repetitive rRNA sequences, generally rRNA is removed from the sample prior to sequencing. This is most often achieved either by specifically depleting rRNA or by selectively enriching for polyadenylated RNA by use of oligo-dT enrichment. Depletion of rRNA preserves information on both coding and noncoding RNA (an important research topic), while enrichment of the poly A fraction preserves only coding mRNA. Poly A enrichment may miss certain RNAs and RNAs with high turnover rates.

Some other methods of avoiding rRNA also exist, such as selective degradation of abundant transcripts or amplification techniques that are biased away from rRNA. However, these are not as common as rRNA depletion or poly A enrichment, and may have the side effect of skewing the transcript representation away from normal.

Another issue to consider is the size of the RNA to be investigated. RNA transcripts span a wide range of sizes; experiments focusing on small RNA (e.g., microRNA or RNAs in the 15–35 bp size range) generally require specialized purification and library construction protocols compared with general RNA analysis. Most other size fractions of RNA can be sequenced together (one of the common steps in RNA-seq is fragmentation of the RNA population down to a common size, such as 200–300 nt).

RNA-seq procedure

Once the method of ribosomal removal and the size fraction to be investigated have been chosen, the RNA is made into a library. For most sequencing machines, this involves first fragmenting the RNA, then creating double-stranded cDNA through reverse transcription. This double-stranded cDNA may then be handled as normal genomic DNA throughout the remaining library construction process. If directional information (strandedness) of the RNA is to be preserved, modified library construction protocols must be used, such as ligating adapters directly to mRNA or marking one of the cDNA strands such that it can be removed prior to sequencing.

When planning the sequencing run itself, the three major issues to consider are read depth, read length, and whether or not to use paired-end data. Read depth provides information on the abundance of RNA transcripts, and greater read depth allows more sensitive detection of rare transcripts. Read length is important in that longer reads have more sensitivity to detect splicing events (intron–exon boundaries, exon–exon boundaries). Paired end data provides greater information on transcript structure, particularly with widely spaced exons. Generally speaking, de novo analysis or searches for novel structural variation will require both high read depth and length, and will benefit from sequencing paired ends. A typical example may have 100–200 M reads, 2 x 50–100 bp. In contrast, expression analysis or profiling will benefit from high read depth, but read length and paired end data provide little extra advantage. A typical experiment for this application may have 10–30 M reads, 1 x 35–100 bp.

Gene-regulation studies

Next-generation sequencing has also proved to be a powerful tool for studies of gene regulation networks. For example, ChIP-seq (or chromatin immunoprecipitation sequencing), can be used to analyze protein-DNA interactions. NGS can also be used to determine global methylation pattern in a genome.

ChIP-Seq

Chromatin immunoprecipitation (ChIP) is a powerful and versatile method for understanding the mechanisms of gene regulation by transcription factors and modified histones. It is used to identify chromatin regions which are bound by transcription factors, co-regulators, modified histones, chromatin remodeling proteins, or other nuclear factors from live cells.

The procedure is time-consuming and involves many steps and variables, each of which must be optimized by the investigator in their model system. After cross-linking cells with formaldehyde, chromatin containing covalent complexes between genomic DNA and all nuclear factors is isolated and sheared by sonication into manageable sizes. Immunoprecipitation with an antibody specific for the target nuclear protein of interest also pulls down any specifical genomic DNA sequences bound by this factor. Reversal of the chemical cross-linking and nucleic acid purification prepare the DNA for detection by sequencing, hybridization-based microarrays, or PCR. ChIP with subsequent next-generation sequencing (ChIP-Seq) is used to study the genome-wide distribution of loci bound by a protein of interest. Compared to microarray analysis (ChIP-Chip) ChIP-Seq offers higher spatial resolution, dynamic range, and genome coverage. This results in a superior sensitivity and specificity for the detection of DNA binding sites. Furthermore, ChIP-Seq generally requires less input material and is more flexible, since no hybridization probes are required and therefore, any species can be studied with sequenced genomes.

Several critical parameters need to be optimized for efficient ChIP-Seq. First, the time and temperature at which the formaldehyde cross-linking is performed must be optimized. If cross-linking of proteins and DNA is too severe, it is more difficult to efficiently fragment the chromatin and to reverse the cross-links before sequencing. A good starting point for further optimization is a 10 minute incubation of 1% formaldehyde at 37°C.

There are several methods to subsequently fragment the chromatin, for example, sonication and enzymatic digestion (e.g., with micrococcal nuclease). If the immunoprecipitated DNA will be analyzed by next-generation sequencing, chromatin fragments of ~100–300 nucleotides are required, and the fragmentation parameters for each test system (cell type, tissue type) also require thorough optimization.

The success of any ChIP experiment depends on the quantity and specificity of the used antibody. It is good practice to use ChIP-validated antibodies which are available from several antibody suppliers. In order to confirm that the antibody specifically precipitates only the protein of interest, western blots can be performed with nuclear extracts. If only one band of the size of the protein is visible on such a blot, the antibody would be considered to be specific. Performing immunoprecipitation with the antibody of choice together with western blotting additionally confirms that the antibody is able to specifically precipitate the protein of interest. If such experiments fail, it may also be useful to perform immunofluorescence staining of cultured cells with the antibody of choice. If only the nuclei are stained this indicates that, at least, the antibody recognizes specifically a protein residing in the nucleus and which may therefore bind to DNA.

Depending on the abundance of the target protein, the amount of the validated antibody and of chromatin for immunoprecipitation must be optimized, in order to obtain enough DNA for sequencing. For antibodies recognizing histones and histone modifications, chromatin from 100,000 to 1 million cells may be used per immunoprecipitation reaction. More chromatin may be necessary if transcription factors are precipitated as they bind to DNA more dynamically and only at a limited amount of genome loci compared to histones. The amount of antibody used per immunoprecipitation reaction is also important to avoid nonspecific binding of other nonspecific proteins and chromatin to excess antibody and to ensure sufficient precipitation of the protein of interest. Normally, 1–10 µg of antibody should deliver reasonable results.

A control reaction may be used to control for the specificity of the ChIP reaction. In a parallel ChIP reaction, with the same amount of chromatin, either an isotype control antibody or beads without any antibody may be used to control for nonspecific binding of proteins and chromatin to antibody and beads. By comparing the amount of precipitated DNA at a given locus between a negative control sample and the ChIP sample, fold enrichment factors at this locus can be calculated. However, the precipitated material in such control immunoprecipitations is often too low to deliver sufficient reads by next-generation sequencing. Therefore, an input control sample is prepared for ChIP-Seq experiments. In this case, usually 1% of cross-linked and sheared chromatin which is used per ChIP reaction is reverse cross-linked and purified side by side with the IP samples and subsequently analyzed by deep sequencing. This allows control for bias introduced by the fragmentation of chromatin depending on the local chromatin structure, DNA amplification, sequencing, copy number variation, and ability to map genomic regions.

After reversing the cross-links and purification of precipitated DNA fragments, it is recommended to confirm the recovery and enrichment of a genome locus which is known to be bound by the protein of interest by real-time PCR. By comparing the qPCR results of the input control with that of the ChIP sample, the percentage recovery of input material can be calculated (ΔCT method). If an isotype control antibody sample (or beads-only control sample) has been prepared this can be compared to the ChIP sample to further quantitate the enrichment of the qPCR detected locus (ΔΔCT method).

At least 10 ng of ChIPed DNA is required for robust preparation of next-generation sequencing libraries. Therefore, the total amount of precipitated DNA must be carefully quantified. However, the total amount of DNA per ChIP reaction is usually very low and therefore methods more sensitive than photometric quantification must be used. Fluorimetric assays (e.g., using PicoGreen) show sufficient sensitivity and dynamic range for the quantification of ChIPed DNA. If insufficient DNA is still obtained, material from several ChIP reactions may be pooled for the preparation of one sequencing library.

Since the amount of starting material for the preparation of such libraries is very low, it is necessary to amplify it after ligating sequencing adapters. For this enrichment, 16–18 PCR cycles are usually required. Too much amplification may result in low complexity libraries and therefore the minimal amount of cycles required to obtain sufficient material should be established.  

After confirming the quality (e.g., using a commercially available instrument, such as the QIAxcel or Bioanalyzer) and the concentration of the library (e.g., by qPCR), the library will be further amplified by emulsion PCR (Ion Torrent) or bridge PCR (Illumina) before final deep sequencing. Usually, short 25–36 nucleotide single reads are sufficient to specifically map the binding sites of the protein of interest to the genome. However, to allow mapping to more difficult regions (e.g., repeat regions) paired-end sequencing can also be applied (i.e., 2 x 25 nucleotides or 2 x 36 nucleotides). The required amount of total sequencing reads depends on the number of genome loci bound by the factor. For transcription factors, binding only a few narrow locations in the genome, 5–12 million reads may be sufficient, while for the analysis of modified histones binding to greater parts of the genome, more reads may be necessary. By increasing the amount of reads, the sensitivity of the analysis can be enhanced to allow the identification of more weakly bound loci. Freely available software (e.g., MACS) may be used to identify genome regions which show more reads than statistically expected (compared to local background or to an input control sample sequenced individually). From such data, the program is able to generate a list of significantly enriched binding sites of the protein of interest.

Optimization of cell harvesting, sonication conditions, and real-time PCR analysis of ChIP-enriched genomic DNA all need to be determined experimentally.

Sequencing-based methylation analysis

See the section of the Protocols and Applications Guide about epigenetics.

Back to top

References

Mortazavi, A., Williams, B.A., McCue, K., et al. (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 5, 621.

Marioni, J.C, Mason, C.E., Mane,S.M., et al. (2008) RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays Genome Res. 18, 1509.

Pickrell, J.K., et al. (2010) Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 464, 768.

The ENCODE Consortium. (2001) Standards, Guidelines and Best Practices for RNA-Seq.V1.0. (June 2011).

Wang, Z., Gerstein, M., Snyder, M. (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57.

Back to top

Resources

DNA
1
Isolation and quantification of genomic DNA from different sample sources and plasmid DNA. How to make and transform competent cells, how to culture and handle plasmid-containing cells, and commonly used techniques for analysis of genomic DNA.
View
RNA
1
This section describes considerations for isolation and quantification of RNA from different sample sources and RNA storage. It also deals with RNAi and the use of siRNA, together with miRNA, mimics, and inhibitors.
View
PCR
1
This section provides a comprehensive guide to PCR. It also includes guidelines and suggestions for maximizing results from your PCR.
View
Whole Genome Amplification
1
Whole genome amplification was developed as a way of increasing the amount of limited DNA samples. This is particularly useful for forensics and genetic disease research, where DNA quantities are limited, but many analyses are required. Various WGA techniques have been developed that differ both in their protocols, amplification accuracy, and ease-of-use.
View
Epigenetics
1
The study of epigenetic mechanisms and DNA methylation has become increasingly important in many areas of research, including DNA repair, cell cycle control, developmental biology, cancer research, identification of biomarkers, predisposition factors, and potential drug targets.
View
Transfection
1
Transfection — the delivery of DNA or RNA into eukaryotic cells — is a powerful tool used to study and control gene expression.
View
Protein
1
As well as providing some general background into proteins and their biology, this guide covers commonly used protocols for expression, purification, analysis, detection and assays.
View
Animal Cell Culture
1
Useful hints for culturing animal cells (i.e., cells derived from higher eukaryotes such as mammals, birds, and insects). The guide covers different types of animal cell cultures, considerations for cell culture, and cell culture protocols.
View