RNA sequencing for beginners

What is RNA sequencing?

RNA sequencing (RNA-seq) is a method of investigating the transcriptome of an organism using deep-sequencing techniques. The RNA content of a sample is directly sequenced after appropriate library construction, providing a rich data set for analysis. RNA-seq is an extremely powerful and revolutionary way to investigate transcriptomes, but it requires care to achieve the highest quality of data.

What is RNA sequencing used for?

The high sensitivity and resolution this technique provides makes it a valuable tool for investigating the entire transcriptional landscape. The quantitative nature of the data and the high dynamic range of the sequencing technology enable gene expression analysis with a high sensitivity. The single-base resolution of the data provides information on single nucleotide polymorphisms (SNPs), alternative splicing, exon/intron boundaries, untranslated regions, and other elements. Additionally, prior knowledge of the reference sequence is not required to perform RNA-seq, allowing for de novo transcriptome analysis and detection of novel variants and mutations.

What factors need to be considered in RNA-seq?

The first factor to consider is the enrichment of the sample. Total RNA generally contains only a very small percentage of coding or functional RNA; ribosomal RNA (rRNA: up to 80–90% of the total RNA) and, to a lesser degree, transfer RNA (tRNA) make up the majority of the RNA in a sample. Generally, rRNA is removed from the sample before sequencing to avoid using 80–90% of one’s sequencing capacity on repetitive rRNA sequences. This is often achieved either by specifically depleting rRNA or selectively enriching polyadenylated RNA using oligo-dT enrichment. Depletion of rRNA preserves information on both coding and noncoding RNA (an important research topic), while enrichment of the poly A fraction preserves only coding mRNA. Poly A enrichment may miss certain RNAs and RNAs with high turnover rates.

Some other methods of avoiding rRNA also exist, such as selective degradation of abundant transcripts or amplification techniques that are biased away from rRNA. However, these are not as common as rRNA depletion or poly A enrichment and may have the side effect of skewing the transcript representation away from normal.

Another issue to consider is the size of the RNA to be investigated. RNA transcripts span a wide range of sizes; experiments focusing on small RNA (for example, microRNA or RNAs in the 15–35 bp size range) generally require specialized purification and library construction protocols compared with general RNA analysis. Most other size fractions of RNA can be sequenced together (one of the common steps in RNA-seq is the fragmentation of the RNA population down to a standard size, such as 200–300 nt).

How does RNA sequencing work?

Once the method of ribosomal removal and the size fraction to be investigated have been chosen, the RNA is made into a library. For most sequencing machines, this involves fragmenting the RNA and creating double-stranded cDNA through reverse transcription. This double-stranded cDNA may be handled as normal genomic DNA throughout the remaining library construction process. If directional information (strandedness) of the RNA is to be preserved, modified library construction protocols must be used, such as ligating adapters directly to mRNA or marking one of the cDNA strands such that it can be removed before sequencing.
When planning the sequencing run itself, the three major issues to consider are read depth, read length, and whether or not to use paired-end data. Read depth provides information on the abundance of RNA transcripts, and greater read depth allows more sensitive detection of rare transcripts. Read length is important because longer reads are more sensitive to detecting splicing events (intron-exon and exon-exon boundaries). Paired end data provides more significant information on transcript structure, particularly with widely spaced exons. Generally speaking, de novo analysis or searches for novel structural variation will require both high read depth and length and will benefit from sequencing paired ends. A typical example may have 100–200 M reads, 2 x 50–100 bp. In contrast, expression analysis or profiling will benefit from high read depth, but read length and paired end data provide little extra advantage. A typical experiment for this application may have 10–30 M reads, 1 x 35–100 bp.

What are some key RNA sequencing methods?

Standard RNA seq (Bulk RNA seq)

Involves extracting total RNA from a sample, converting it into cDNA, and sequencing it using high-throughput sequencing technologies. Provides a snapshot of the average gene expression across a population of cells, useful for identifying differentially expressed genes between conditions.

mRNA seq

Selects messenger RNA (mRNA) by capturing polyadenylated [poly(A)+] tails, effectively enriching for protein-coding transcripts. Focuses on gene expression analysis of coding regions, reducing the complexity of the transcriptome and increasing the depth of coverage for mRNAs.

Targeted RNA seq

Focuses on sequencing specific subsets of RNA transcripts rather than the entire transcriptome; and can be achieved via either enrichment or amplicon-based approaches. Allows for the detection of low-abundance transcripts, particularly useful in biomarker validation and studies where specific genes or pathways are of interest.

Exome-capture RNA seq

Utilizes exome capture probes to enrich for coding regions before sequencing. Enhances the detection of low-abundance transcripts and improves coverage of coding exons.

Single-cell RNA seq (scRNA seq)

Sequencing RNA from individual cells to analyze gene expression at single-cell resolution. Reveals cellular heterogeneity, identifies rare cell populations, and tracks cellular development and differentiation processes.

Total RNA seq

Depletes ribosomal RNA (rRNA), which constitutes the majority of total RNA, to enrich for both coding and non-coding RNAs. Enables the study of non-polyadenylated RNAs, such as long non-coding RNAs (lncRNAs), small RNAs, and degraded RNA fragments.

Small RNA seq (including miRNA seq)

Focuses on sequencing small RNA molecules, such as microRNAs (miRNAs) and small interfering RNAs (siRNAs). Investigates the role of small RNAs in gene regulation, development, and disease.