Gene regulation studies

Gene-regulation studies

Next-generation sequencing has also proved to be a powerful tool for studies of gene regulation networks. For example, ChIP-seq (or chromatin immunoprecipitation sequencing), can be used to analyze protein-DNA interactions. NGS can also be used to determine global methylation pattern in a genome.

ChIP-Seq

Chromatin immunoprecipitation (ChIP) is a powerful and versatile method for understanding the mechanisms of gene regulation by transcription factors and modified histones. It is used to identify chromatin regions which are bound by transcription factors, co-regulators, modified histones, chromatin remodeling proteins, or other nuclear factors from live cells.

The procedure is time-consuming and involves many steps and variables, each of which must be optimized by the investigator in their model system. After cross-linking cells with formaldehyde, chromatin containing covalent complexes between genomic DNA and all nuclear factors is isolated and sheared by sonication into manageable sizes. Immunoprecipitation with an antibody specific for the target nuclear protein of interest also pulls down any specifical genomic DNA sequences bound by this factor. Reversal of the chemical cross-linking and nucleic acid purification prepare the DNA for detection by sequencing, hybridization-based microarrays, or PCR. ChIP with subsequent next-generation sequencing (ChIP-Seq) is used to study the genome-wide distribution of loci bound by a protein of interest. Compared to microarray analysis (ChIP-Chip) ChIP-Seq offers higher spatial resolution, dynamic range, and genome coverage. This results in a superior sensitivity and specificity for the detection of DNA binding sites. Furthermore, ChIP-Seq generally requires less input material and is more flexible, since no hybridization probes are required and therefore, any species can be studied with sequenced genomes.

Several critical parameters need to be optimized for efficient ChIP-Seq. First, the time and temperature at which the formaldehyde cross-linking is performed must be optimized. If cross-linking of proteins and DNA is too severe, it is more difficult to efficiently fragment the chromatin and to reverse the cross-links before sequencing. A good starting point for further optimization is a 10 minute incubation of 1% formaldehyde at 37°C.

There are several methods to subsequently fragment the chromatin, for example, sonication and enzymatic digestion (e.g., with micrococcal nuclease). If the immunoprecipitated DNA will be analyzed by next-generation sequencing, chromatin fragments of ~100–300 nucleotides are required, and the fragmentation parameters for each test system (cell type, tissue type) also require thorough optimization.

The success of any ChIP experiment depends on the quantity and specificity of the used antibody. It is good practice to use ChIP-validated antibodies which are available from several antibody suppliers. In order to confirm that the antibody specifically precipitates only the protein of interest, western blots can be performed with nuclear extracts. If only one band of the size of the protein is visible on such a blot, the antibody would be considered to be specific. Performing immunoprecipitation with the antibody of choice together with western blotting additionally confirms that the antibody is able to specifically precipitate the protein of interest. If such experiments fail, it may also be useful to perform immunofluorescence staining of cultured cells with the antibody of choice. If only the nuclei are stained this indicates that, at least, the antibody recognizes specifically a protein residing in the nucleus and which may therefore bind to DNA.

Depending on the abundance of the target protein, the amount of the validated antibody and of chromatin for immunoprecipitation must be optimized, in order to obtain enough DNA for sequencing. For antibodies recognizing histones and histone modifications, chromatin from 100,000 to 1 million cells may be used per immunoprecipitation reaction. More chromatin may be necessary if transcription factors are precipitated as they bind to DNA more dynamically and only at a limited amount of genome loci compared to histones. The amount of antibody used per immunoprecipitation reaction is also important to avoid nonspecific binding of other nonspecific proteins and chromatin to excess antibody and to ensure sufficient precipitation of the protein of interest. Normally, 1–10 µg of antibody should deliver reasonable results.

A control reaction may be used to control for the specificity of the ChIP reaction. In a parallel ChIP reaction, with the same amount of chromatin, either an isotype control antibody or beads without any antibody may be used to control for nonspecific binding of proteins and chromatin to antibody and beads. By comparing the amount of precipitated DNA at a given locus between a negative control sample and the ChIP sample, fold enrichment factors at this locus can be calculated. However, the precipitated material in such control immunoprecipitations is often too low to deliver sufficient reads by next-generation sequencing. Therefore, an input control sample is prepared for ChIP-Seq experiments. In this case, usually 1% of cross-linked and sheared chromatin which is used per ChIP reaction is reverse cross-linked and purified side by side with the IP samples and subsequently analyzed by deep sequencing. This allows control for bias introduced by the fragmentation of chromatin depending on the local chromatin structure, DNA amplification, sequencing, copy number variation, and ability to map genomic regions.

After reversing the cross-links and purification of precipitated DNA fragments, it is recommended to confirm the recovery and enrichment of a genome locus which is known to be bound by the protein of interest by real-time PCR. By comparing the qPCR results of the input control with that of the ChIP sample, the percentage recovery of input material can be calculated (ΔCT method). If an isotype control antibody sample (or beads-only control sample) has been prepared this can be compared to the ChIP sample to further quantitate the enrichment of the qPCR detected locus (ΔΔC_T method).

At least 10 ng of ChIPed DNA is required for robust preparation of next-generation sequencing libraries. Therefore, the total amount of precipitated DNA must be carefully quantified. However, the total amount of DNA per ChIP reaction is usually very low and therefore methods more sensitive than photometric quantification must be used. Fluorimetric assays (e.g., using PicoGreen) show sufficient sensitivity and dynamic range for the quantification of ChIPed DNA. If insufficient DNA is still obtained, material from several ChIP reactions may be pooled for the preparation of one sequencing library.
Since the amount of starting material for the preparation of such libraries is very low, it is necessary to amplify it after ligating sequencing adapters. For this enrichment, 16–18 PCR cycles are usually required. Too much amplification may result in low complexity libraries and therefore the minimal amount of cycles required to obtain sufficient material should be established.

After confirming the quality (e.g., using a commercially available instrument, such as the QIAxcel or Bioanalyzer) and the concentration of the library (e.g., by qPCR), the library will be further amplified by emulsion PCR (Ion Torrent) or bridge PCR (Illumina) before final deep sequencing. Usually, short 25–36 nucleotide single reads are sufficient to specifically map the binding sites of the protein of interest to the genome. However, to allow mapping to more difficult regions (e.g., repeat regions) paired-end sequencing can also be applied (i.e., 2 x 25 nucleotides or 2 x 36 nucleotides). The required amount of total sequencing reads depends on the number of genome loci bound by the factor. For transcription factors, binding only a few narrow locations in the genome, 5–12 million reads may be sufficient, while for the analysis of modified histones binding to greater parts of the genome, more reads may be necessary. By increasing the amount of reads, the sensitivity of the analysis can be enhanced to allow the identification of more weakly bound loci. Freely available software (e.g., MACS) may be used to identify genome regions which show more reads than statistically expected (compared to local background or to an input control sample sequenced individually). From such data, the program is able to generate a list of significantly enriched binding sites of the protein of interest.

Optimization of cell harvesting, sonication conditions, and real-time PCR analysis of ChIP-enriched genomic DNA all need to be determined experimentally.