A comparison of RNA-seq and exon arrays for whole genome transcription profiling of the L5 spinal nerve transection model of neuropathic pain in the rat
Molecular Pain volume 10, Article number: 7 (2014)
The past decade has seen an abundance of transcriptional profiling studies of preclinical models of persistent pain, predominantly employing microarray technology. In this study we directly compare exon microarrays to RNA-seq and investigate the ability of both platforms to detect differentially expressed genes following nerve injury using the L5 spinal nerve transection model of neuropathic pain. We also investigate the effects of increasing RNA-seq sequencing depth. Finally we take advantage of the “agnostic” approach of RNA-seq to discover areas of expression outside of annotated exons that show marked changes in expression following nerve injury.
RNA-seq and microarrays largely agree in terms of the genes called as differentially expressed. However, RNA-seq is able to interrogate a much larger proportion of the genome. It can also detect a greater number of differentially expressed genes than microarrays, across a wider range of fold changes and is able to assign a larger range of expression values to the genes it measures. The number of differentially expressed genes detected increases with sequencing depth. RNA-seq also allows the discovery of a number of genes displaying unusual and interesting patterns of non-exonic expression following nerve injury, an effect that cannot be detected using microarrays.
We recommend the use of RNA-seq for future high-throughput transcriptomic experiments in pain studies. RNA-seq allowed the identification of a larger number of putative candidate pain genes than microarrays and can also detect a wider range of expression values in a neuropathic pain model. In addition, RNA-seq can interrogate the whole genome regardless of prior annotations, being able to detect transcription from areas of the genome not currently annotated as exons. Some of these areas are differentially expressed following nerve injury, and may represent novel genes or isoforms. We also recommend the use of a high sequencing depth in order to detect differential expression for genes with low levels of expression.
Gene expression studies can be used to provide insights into the molecular mechanisms underlying the onset and maintenance of pain [1–5]. Such approaches can be hypothesis-driven, assessing the expression of preselected candidate molecules, or hypothesis-independent, interrogating gene expression at the genome-wide level.
Microarrays have been used extensively to investigate transcriptional changes that occur in different parts of the central and peripheral nervous systems . Such studies have led to the discovery of novel pain-related genes, such as the Potassium voltage-gated channel subfamily S member 1, KCNS1 , GTP cyclohydrolase 1, GCH1  and the neuropeptide VGF nerve growth factor inducible . In recent years RNA-sequencing (RNA-seq) has emerged as an alternative platform for high-throughput transcriptional profiling . The platform has been used in a variety of studies, but so far only one report has described the use of RNA-seq to measure gene expression changes in the peripheral nervous system in an experimental model of pain . It has also been used to perform microRNA profiling following sciatic nerve injury  and to study gene expression changes in the pre-frontal cortex following spared nerve injury .
Microarray chips measure the expression of thousands of genes in a sample by quantifying the hybridisation of fragmented cDNA derived from gene-transcripts to a set of complementary probes specifically designed to detect a set of genes or transcripts. They have been used to study a variety of biological systems . However, the use of probes leads to a number of drawbacks, including non-specific binding and signal saturation , which can negatively affect the measurement of expression for both lowly and highly expressed genes. Furthermore, microarray design is based on prior knowledge of the transcriptome and therefore microarrays can only interrogate a subset of known (or predicted) transcripts.
RNA-seq represents an alternative to microarrays . It uses high throughput sequencing technology to investigate RNA expression  and allows the quantification of thousands of transcripts within a cell line or tissue without the need for a priori knowledge of the transcriptome. This “agnostic” approach represents a major advantage over microarrays, allowing the discovery of new transcript variants, novel genes and the annotation of less well characterized genomes [15, 16].
RNA-seq technology also suffers from drawbacks. The output of an RNA-seq experiment consists of millions of reads, short sequences of cDNA derived from RNA molecules. These reads must be mapped to a reference genome in order to identify the genomic location of the originating transcript and thus quantify expression . Thus, analysis can be computationally expensive and time consuming. A common challenge arises due to the presence of reads that cannot be mapped to the genome. This can be due to genomic differences (such as polymorphisms) between the sample and the reference genome or erroneous base calling by the sequencing technology . Conversely, RNA-seq reads may map to more than one genomic location. Such ambiguous reads can lead to imprecise gene quantification.
An important consideration when designing an RNA-seq experiment is sequencing depth, the number of reads generated per sample. In general it is expected that the higher the sequencing depth the more accurately the transcriptome of the tissue of interest is quantified [18, 19]. This is particularly important for the accurate detection of lowly expressed genes, where problems related to sampling error can lead to an over or under-estimation of transcript abundance. However, the cost of the experiment will also increase with sampling depth.
In addition to the technical pros and cons of both platforms, price is an important factor. Although RNA-seq costs are decreasing, microarrays remain more affordable, and data analysis is more standardised and easily implemented . RNA-seq, on the other hand, demands more computational power and bioinformatics expertise; therefore it is important to determine to what extent the additional knowledge generated by RNA-seq experiments outweighs the computational demands and economic costs.
A large number of comparisons between RNA-seq and gene expression microarrays have been reported, covering a wide variety of different experimental designs, platforms, organisms, tissues, cell lines and experimental interventions. Table 1 shows an overview of these previous studies. So far, no direct comparison has been made using exon arrays for non-human, heterogeneous tissue from different individuals.
In this study we investigate whether RNA-seq offers an advantage over microarrays for the study of differential gene expression within dorsal root ganglia (DRG) following nerve injury using the L5 spinal nerve transection (SNT) model of neuropathic pain in the rat . Expression changes induced by this injury have been well characterised by microarrays [1, 4, 34], proteomics  and reverse transcription quantitative PCR (RT-qPCR) . We have used technical replicates of the same biological samples and subjected them to exon expression array and RNA-seq analysis. This circumvents confounding effects brought about by comparing distinct technologies using historical data from previous/independent studies. We demonstrate the technical superiority of RNA-seq over microarrays in terms of sensitivity (ability to measure transcripts) and ability to detect differential gene expression. The latter is particularly important in the context of pain and nerve injury, as RNA-seq detects a large number of highly dysregulated genes, which may represent novel candidate pain genes. We also investigate the effects of increased sequencing depth on the results of an RNA-seq experiment, and the ability of RNA-seq to detect expression originating from unannotated genomic regions.
Spinal nerve transection (SNT) is a widely used experimental model of neuropathic pain, associated with profound changes in gene expression in dorsal root ganglia (DRG) . It therefore represents an excellent model for assessing the relative merits of RNA-seq in comparison with microarrays in the context of pain. DRG tissue was harvested at day 7 post transection, a time point at which altered gene expression and associated pain-related behaviour are well established .
We have performed RNA-seq and microarray analysis in parallel on technical replicates of the same biological samples. This design allowed us to avoid confounding effects due to biological variability and other differences in sample preparation. Poly(A) enriched RNA derived from the L5 DRG 7 days following L5-SNT and from naive L5 DRG tissue was subjected to microarray analysis and RNA-seq, as outlined in Figure 1. Therefore, in total there were three biological replicates per condition; each of these six replicates was subdivided to produce technical replicates, one of which was used for microarray analysis, the other for RNA-seq. The aims of our experiment were three-fold: Firstly, to compare the ability of both methods to measure gene expression and identify differentially expressed (DE) genes. Secondly, to compare three distinct read depths for RNA-seq in order to investigate its impact on the detection of transcriptional changes across different levels of gene expression. Thirdly, to investigate changes in gene expression influenced by expression from areas outside of annotated exons (areas within annotated 5’ and 3’ ends of genes that are not annotated as exons in the reference genome) within the DRG following SNT.
Mapping sequenced reads to the genome
We performed RNA-seq using the Illumina GAIIx platform. The protocol employed yielded 34 base-pair long reads, at three different sequencing depths per sample (~17, ~ 25 and ~50 M). Reads failing Illumina quality control due to ambiguous base calling were filtered out and the remaining reads were subsequently mapped to the reference rat genome (Rn5), permitting up to one mismatch when aligning reads to the genome (Figure 2A). Filtered reads that did not map to the genome or that could be mapped to more than one genomic location (ambiguous reads) were removed from the analysis.
Mapped reads were categorised as exonic, intronic or intergenic as described in the Methods section (Figure 2B). An example of the read mapping procedure is giving in Figure 3, which shows the reads mapping to the genomic location of the gene Calcium Channel, Voltage-Dependent, Alpha 2/Delta Subunit 1 (Cacna2d1) for a given sample (Figure 3A-D) and for several samples (Figure 3E). The stacked bar charts (Figure 2C) illustrate the proportion of reads belonging to each category for naive and SNT samples, sequenced to a depth of 50 M reads. A high proportion of reads (~35%) map to intergenic regions, reflecting transcription from previously unannotated areas of the rat genome. In addition, when comparing the proportion of exonic and intronic reads across experimental groups, we observe a small but consistent and significant increase in the proportion of reads aligning to intronic regions following SNT (p < 0.001; see Methods section). These finding are suggestive of the increasingly recognised phenomenon of non-exonic, “dark matter” transcription [37–42]. At the level of individual genes, this observation may reflect an effect of SNT on alternative splicing that leads to inclusion of novel exons that have not yet been annotated, or the expression of unannotated overlapping/nested genes in the same genomic location. These putative nested genes may be protein coding or other polyadenylated RNA species. Detection of such an increase in non-exonic gene expression would not be possible using exon arrays, since they do not thoroughly profile the intronic regions of the genome (although for some genes there are probes that map to intronic regions). This increase in reads aligning to regions of genes that are not annotated as exonic was also found at read depths of 17 M and 25 M. In terms of intergenic aligning reads, there was a slight decrease following SNT, however this was not significant following Bonferroni correction for multiple testing.
Proportions of the genome that can be measured by exon arrays and RNA-seq
RNA-seq data was aligned to the latest rat genome assembly, Rn5, whilst the microarray annotation available from NetAffx was designed for the previous version, Rn4. Rn5 contains annotation for 26405 Ensembl rat genes including protein-coding genes, miRNAs, ribosomal RNAs and pseudo-genes . Rn4 contains annotation for 29516 rat genes. The intersection between these genome builds in terms of genes comprises 20914 Ensembl ids.
In order to compare gene expression between exon arrays and RNA-seq we first established the maximum number of genes that can theoretically be measured by each of the platforms, and the overlap between platforms. Microarrays contain a set of probes specifically designed to detect the expression of transcripts that have been previously characterised or computationally predicted. Affymetrix Rat Exon 1.0 ST arrays contain 17818 transcript-cluster mapping probesets, which have been assigned a range of confidence levels: core (genes obtained from RefSeq/Genbank records – representing 7947 Ensembl genes), extended (supported by EST or partial mRNA evidence – an additional 6221 genes) and full (bioinformatically predicted – another 3650 genes). For this study we used the probes assigned with core and extended confidence levels, since this led to the detection of the largest number of DE genes between SNT and naive samples after correcting for multiple testing.
RNA-seq can, in theory, interrogate the transcription of any of the 26405 Ensembl genes through the mapping of the sequenced reads to their genomic location. It should be mentioned that these annotated Ensembl genes are not necessarily protein-coding genes; they include miRNAs, ribosomal RNAs and pseudo-genes, and will not necessarily be expressed in the DRG, nor selected for by poly(A) enrichment. In addition, a number of genes cannot be interrogated in practice because all of their exons overlap exons from other genes in terms of genomic location (either on the same strand, or opposite strands). Excluding genes for these reasons reduces the total number of genes detectable by our sequencing protocol to 26172 genes. Table 2 shows the overlap between the different platforms using the core, extended and full confidence Affymetrix probe annotations for the intersection of Ensembl genes annotated in both Rn4 and Rn5 genome builds.
To conclude, regardless of probe confidence levels, the RNA-seq protocol is potentially able to quantify a much larger number of genes than microarrays.
Comparison of exon arrays and RNA-seq for the measurement of absolute gene expression
In order to compare the ability of the different platforms to detect gene expression we considered the genes that could be measured by both platforms (using core and extended confidence probesets). This amounts to a total of 13042, consisting of all the genes measurable by exon arrays excluding only those that contain overlapping exons and therefore cannot be measured by RNA-seq, and that are annotated in both the Rn4 and Rn5 genomes.
Figure 4A depicts the correlation between the log2 normalised probe intensities for each transcript cluster and its respective expression level as determined by RNA-seq measured in reads per kilobase per million mapped reads (RPKM) as proposed by Mortazavi et al. . There is a positive correlation between the hybridization intensities and RPKM for genes detected by RNA-seq (red points). However there are a number of genes not detected by RNA-seq (0 reads aligned to the exons for that gene; blue points). The two platforms show a large agreement for genes with high levels of expression, however there is less agreement for genes with low levels of expression. For example, for genes with a log normalised hybridization intensity value below ~6 in the exon arrays, RNA-seq is able to assign a much wider range of expression values.
Possible causes for the reduced correlation for more lowly expressed genes may be related to non-specific binding to probes for the microarrays or sampling error for RNA-seq, which would affect lowly expressed genes to a greater extent. In support of the non-specific binding for microarrays cause, we notice that there is very little correlation for the genes below the horizontal lines – these lines represent the median (solid line) and median plus one median absolute deviation (dashed line) values for the antigenomic control probes of the exon array. These probesets comprise probes that have been chosen to estimate background hybridisation; they do not match any sequence in the rat, mouse or human genome. Interestingly, there is a higher Spearman’s correlation coefficient between platforms for gene expression measurements of naive tissue (Figure 4A) than SNT (Figure 4B). The reasons for this are unclear, but could be related to the abilities of the different platforms to measure RNA expression in injured tissue. We have depicted the correlation plot for the 50 M read depth, as the results are qualitatively equivalent at lower read depths (Additional file 1).
Comparison of exon arrays and RNA-seq for the detection of differential gene expression
We compared the ability of both platforms to detect differential gene expression. We compared log2 fold change (FC SNT/naive) values as determined by RNA-seq to exon arrays. We observed a general agreement in the direction of FC for significantly DE genes detected by both platforms (Figure 5A, red points), with some DE genes being exclusively detected by RNA-seq (green points). Interestingly, a number of genes not detected by RNA-seq in one of the experimental groups – giving rise to infinite FCs (blue points) – are deemed significantly DE by exon arrays. However the direction of FCs does not necessarily agree between the two platforms and these genes usually show a small fold-change value. This suggests that for some genes, the apparently significant FC detected by exon arrays may be due to the effects of non-specific binding, or some general technical variability, although in some cases this might be due to the RNA-seq protocol not being able to align enough reads to classify the gene as significantly DE. Further experimental validation of these genes is necessary in order to confirm that they are truly changing between samples, but this change cannot be detected using RNA-seq.
It is also notable that a far wider range of log2 FC is detectable using RNA-seq. This is shown in Figure 5B, which shows the distribution of absolute log2 FCs found by either technology. In addition, we can see that the genes called DE only using RNA-seq show a similar distribution of FCs to the genes called DE by both platforms. Conversely the genes called DE only using microarrays show a much narrower distribution of FCs than the genes called DE by both platforms. Additional file 2 shows the distributions of FCs for significantly DE genes alongside those genes that are not called DE by either platform. The significantly DE genes exclusively detected by microarrays show a similar distribution to the non-DE microarray genes. Conversely, the separation between the distributions of FCs for the DE RNA-seq and the non-DE RNA-seq genes is much clearer. This supports the idea that some of the genes detected as DE by microarray only are likely to be called as such because of noise related to non-specific binding. This leads to a small but consistent FC between samples and not true transcriptional changes.
Figure 5C shows the overlap between the DE genes found by microarrays and RNA-seq at distinct read depths. We see a large overlap between the different platforms; this overlap increases with RNA-seq depth. We also see that a higher sequencing depth leads to a sharp increase in the number of genes that are called DE using RNA-seq only.
Comparison of exon arrays and RNA-seq at the exon level
There are 233498 exons annotated in the rat genome according to Ensembl (version 69). Removing overlapping exons leaves 209219 that can be probed uniquely by RNA-seq. Microarrays can probe 146765 exons (using core and extended confidence probesets). In order to compare the two platforms directly, in terms of expression of individual exons, RNA-seq reads were aligned to the Rn4 genome assembly for this analysis (Additional file 3). As with Figure 5A, we see a good agreement between platforms. However the correlation is less strong for exons than for genes. Reasons for the weaker correlation include the increased sampling error inherent when counting reads mapping to exons, since exons represent a much shorter genomic area than genes. Estimating microarray expression at the exon level is also likely to be less accurate than estimating gene expression, due to the reduced number of probes across which to summarise the probe intensity value. A weaker correlation between platforms is also observed for FCs between SNT and naive samples (Additional file 3: Figure B). The Venn diagram in Additional file 3: Figure C shows that at a 50 M read depth microarrays and RNA-seq detect a similar amount of differentially expressed exons. This is in contrast with the gene level comparison, which shows that RNA-seq can detect a much higher number of DE genes. Both of these observations are likely to be due to sampling error, leading to difficulties in separating RNA-seq expression from “shot noise” using the DESeq algorithm (as described in ). Sequencing to a higher depth could reduce this sampling error. Because the exons are much shorter, shot noise is more problematic at the exon level than the gene level. We also notice that for lower sequencing depths the number of exons called DE using RNA-seq is much lower, suggesting 50 M is the minimum read depth that should be used when investigating DE of exons in heterogeneous tissue (Additional file 4).
Sequencing depth and the detection of differentially expressed genes
We investigated the effect of sequencing depth on the ability of RNA-seq to find differentially expressed genes. In order to do this we sequenced replicates of each sample at three depths: 17 M, 25 M and 50 M reads. The Venn diagram in Figure 6A shows the total number of DE genes detected at each depth. It is clear that, whilst there is a very large overlap between the results at all three sequencing depths, increasing read depth leads to the detection of a higher number of DE genes. Generally, genes detected as DE at a lower sequencing depth will also be detected as DE when sequencing to higher depth. However it should be noted that a small number of genes are detected as DE at lower sequencing depths only.
We considered the differential expression of genes that were detected as DE at a depth of either 17 M reads only, or 50 M reads only. Figure 6B shows the distribution of absolute log2 FCs for these genes. The genes found DE exclusively at a 50 M read depth show a wider range of FCs, with fewer genes showing a log 2 fold change less than one. This observation is likely to be related to sampling errors for genes with a low read count, which have a higher impact on measurements at low read depths: through sampling error, these genes may have obtained inflated (higher) values for SNT samples and deflated (lower) values for naive samples, or vice-versa, when in reality there is no change between samples. This is often a problem for RNA-seq experiments, since limitations related to the cost of the technology often mean that expression can only be measured for a limited number of samples.
To further investigate the hypothesis that the DE genes found at a depth of 17 M may be erroneously labelled as DE due to sampling error, we have plotted the mean number of aligned reads against FC for all genes (Figure 6C). The 295 genes deemed as DE at 17 M only are highlighted in orange. We see that the 17 M only genes are on the cusp of significance at a depth of 17 M, and that many have low read counts, i.e. few reads align to these genes. In addition, many of these genes show much lower FCs at a depth of 25 M and 50 M reads.
The opposite is true for the genes deemed significant at a depth of 50 M only (highlighted in green in Figure 6D) – we see that, at all sequencing depths, most of these genes still maintain large FCs, and are situated close to the pink coloured points, which represent the genes significant at both 17 M and 50 M read depths.
In summary, using the higher sequencing depth of 50 M reads leads to the detection of a larger number of DE genes, particularly for genes with low read counts.
Intronic expression and its effect on fold change calculation
We also examined expression from intronic regions, a feature of expression that can be quantified much more precisely and comprehensively when using RNA-seq than exon arrays, which by definition are designed to probe known (and predicted) exons.
In Figure 7A we compare the log2 FCs in gene expression calculated when considering exonic reads only (x axis) with log2 FCs calculated considering exonic + intronic reads (y-axis). If intronic expression is absent or is proportional to exonic expression in all samples then the point that represents a given gene will fall close to x = y on the graph. This is the case for the majority of genes (Figure 7B). However there are also a number of genes where intronic expression is proportionally much higher for SNT samples than naive samples, and vice versa. This change in fold change as a result of including reads that align to intronic regions may reflect the inclusion of novel exons in these transcripts, may be due to the expression of unannotated genes that occur at the same loci, or possibly due to non-coding RNAs. We measured the significance of these relative changes in intronic expression between naive and SNT samples using DEXSeq  (Additional files 5 and 6). Using an FDR cutoff of 0.1, we found 2030 genes showing a significant change in ratios of intronic to exonic expression following SNT (1914 genes show a relative increase in expression from non-exonic regions following SNT, whilst 116 of these genes show a decrease), these results were similar at more conservative FDR cutoffs. In Figure 7C we show an example of one of the top genes found using this method (in terms of a low p-value): ST3 beta-galactoside alpha-2,3-sialyltransferase 6 (St3gal6). We show genome graphs for the genomic coordinates of the gene, showing expression in the naive samples (left, 7 Ci) and SNT samples (right, 7Cii). We see that for SNT samples more reads map to intronic regions, and less to exonic regions. This results in contrasting patterns of differential expression for this gene after SNT: downregulation when only exonic reads are considered, and upregulation if exonic + intronic reads are considered. These reads may represent novel exons of the annotated St3gal6 gene that are expressed after SNT; alternatively, they may originate from a yet unannotated gene with overlapping genomic coordinates.
In any case, the pattern of expression of the transcript (or transcripts) arising from this genomic location after SNT, demonstrates an advantage of RNA-seq: finding areas of expression occurring outside of annotated exons. Using microarrays, we would not be able to find such unusual intronic expression, as shown by the positions of exon array probes for the genes in Figures 7C: although exon arrays do sometimes probe intronic regions, we see the coverage is not as comprehensive as that offered by RNA-seq. The biological significance of non-exonic expression is still very much an open question, however it is clear that over the last few years more and more such regions of non-exonic expression have being detected and several functions have being ascribed to them [42, 46].
Biological function analysis
We compared two genome-wide expression technologies in their ability to detect differential gene expression in L5 DRGs in response to SNT. Having established that RNA-seq outperforms exon arrays from a technical point of view, we investigated how the biological insights provided by the two datasets compare. Firstly, we interrogated the datasets for differential expression of classic “pain markers” as well as novel candidate genes, and secondly we performed functional analysis of entire datasets to investigate physiological, cellular and molecular events that are disturbed by SNT and may underlie pain conditions.
A number of published microarray studies have addressed gene expression changes in models of pain [1, 4, 34]. Genes typically dysregulated in pain conditions include injury markers (e.g. Atf3), ion channel subunits (e.g. Cacna2d1, Kcnc2), neuropeptides (Gal, Npy), inflammatory mediators such as cytokines and chemokines (Ccl2, Cxcl10, Cxcl13), and growth factors (Vgf). LaCroix-Fralish et al.  performed a systematic review of microarray studies in rodent neuropathic and inflammatory pain models, identifying a list of genes that are commonly dysregulated. In our exon array dataset, the great majority of these genes appeared dysregulated in the direction expected (Table 3). Reassuringly, this was in agreement with RNA-seq data, with the added advantage that the magnitude of FC estimated for the same genes was higher in RNA-seq, a reflection of its higher dynamic range.
In high throughput transcriptomic studies, the prioritization of candidates for further validation is generally dictated by the magnitude (and significance) of FC. In order to determine how the choice of candidates is influenced by technological platform, we compiled lists of the top 50 significant fold changes for each method (Table 4). Twenty-five genes are simultaneously ranked amongst the top 50 by both techniques. For the remaining genes in the RNA-seq top 50, thirteen are also deemed as DE by exon arrays, although with a lower FC, two genes are not deemed significantly dysregulated by exon arrays and ten genes cannot be measured/detected by exon arrays due to lack of probes at the core or extended confidence levels. Amongst the top 50 genes found by exon arrays that do not coincide with the RNA-seq top 50, twenty-two are still deemed as DE by RNA-seq (with fold changes higher than 4 fold). The remaining three genes cannot be detected by our RNA-seq alignment protocol, due to the existence of paralogous genes with high sequence conservation, which leads to reads from these transcripts being classified as ambiguous and discarded from the analysis.
The knowledge generated by our high throughput studies is not restricted to the identification of individual candidate genes in pain. Gene ontology and literature information on the roles of dysregulated genes provide insights into the biological phenomena compromised after SNT. As depicted in Figure 8A, the distribution of dysregulated genes based on corresponding “protein classes” (as categorized by PANTHER) is similar in both datasets.
Similarly, functional analysis using Ingenuity Pathway Analysis (IPA) revealed similar significant enrichment for the same biological functions/“diseases and disorders” for both datasets (Figure 8B). As expected, and in agreement with previously published studies [47–49], categories related to “neuronal/neurological function” and “immune/inflammatory” are significantly enriched.
Despite the overall similarity in top biological functions, a more detailed analysis using “canonical pathways”, annotated in IPA revealed very specific features of each of the datasets. While the most significant physiological/cellular/molecular pathways in the exon array dataset reflect the heavy contribution of immune system-related genes to the overall profile of the tissue after injury (Figure 8 Ci), in the RNA-Seq dataset there is a clear representation of neuronal pathways (Figure 8 Cii). These include “axonal guidance signalling”, “ephrin receptor signalling”, “Nerve Growth Factor (Ngf) signalling”, “reelin signalling in neurons”, and “CREB-signalling in neurons”. The overrepresentation of neuronal pathways in the RNA-seq dataset is due to the number of genes assigned to these pathways that are deemed as DE by RNA-seq but not by exon arrays. For example, a total of 483 genes are ascribed to the “axon guidance” pathway according to the IPA “canonical pathways” database. In our datasets, 292 “axon guidance genes” are simultaneously deemed as DE by microarrays and RNA-seq. However an additional 73 genes are exclusively detected by RNA-seq. Furthermore, the list of DE genes detected by RNA-seq only features molecules with established functions in nociception and/or pain such as neuronal nitric oxide synthase 1 (Nos1), and the transient receptor potential cation channel, subfamily V, member 4 (Trpv4, ), as well as a number of molecules belonging to protein categories such as “receptors”, “transporters”, “G-protein coupled receptors”, “ion channels” and “signalling molecules”, which may be important in neuronal function.
RNA-seq has several advantages over microarrays
RNA-seq technology presents a novel tool for comprehensive, high throughput whole genome transcriptional profiling. In this study we profiled injured L5 DRG tissue following spinal nerve transection, using RNA-seq and microarrays. The same RNA samples were used for both platforms to enable a direct comparison of these technologies in an experimental pain model for the first time. We assessed both technologies for their ability to interrogate the transcriptome, detect gene expression, and to identify dysregulated genes that represent putative novel pain mediators.
Hammer and colleagues , have reported the use of RNA-seq to profile expression in the uninjured L4 DRG following L5 spinal nerve transection and reported a higher number of transcriptional changes in the SNT model than previously estimated in published microarray studies.
In the more direct comparison presented here, we also found a larger number of DE genes using RNA-Seq than using exon arrays, consistent with the literature [20, 24, 25]. This is partly due to the wider dynamic range of RNA-seq, as microarrays suffer from non-specific binding to probes, and signal saturation [13, 20]. Non-specific binding leads to background signals which affect detection/quantification of lowly expressed genes, while highly expressed mRNA species may saturate the fluorescent signal, which can compromise the detection of differential expression. However it should be noted that estimation of DE for short length/lowly expressed genes can be also inaccurate for RNA-seq . This is especially important with regards to exons – our study found a similar number of DE exons using either platform when sequencing to a depth of 50 M, and microarrays outperformed RNA-seq at lower sequencing depths (Additional file 3).
Discrepancy in the number of DE genes detected by either method may also be caused by the different analysis methods employed (limma for microarray analysis  and DESeq for RNA-seq ). However we believe that these do not have a significant impact on our findings, since both methods apply variance-shrinking steps to reduce false positives. In fact, DESeq has been shown to be one of the most conservative methods for RNA-seq differential expression detection . The variance-shrinking steps are implemented to deal with small fold changes occurring by chance due to extremely low values of variance between samples, a common problem when testing a large number of genes across a small number of samples. Such shrinkage methods have been shown to outperform methods based on simple t-tests and fold change cut-offs . For this reason we did not impose an “effect size filter” when determining significantly DE genes. Finally, we do not believe that the FDR cutoff (0.1) favoured either technology, as we obtained qualitatively similar results with other FDRs (Additional file 7).
Despite the previously mentioned advantages of RNA-seq over microarrays, we observed a number of genes detected as DE using microarrays but not classed as such by RNA-seq (Figure 5). We have shown in Figure 4B and Additional file 2 that most of these genes have low FC values. Some of these genes may be false positives: the apparent significant change in expression might be due to non-specific binding. However it is also likely that some of these genes are truly DE. There are several reasons why microarrays might detect DE genes that are missed when using RNA-seq. For example, some genes will share high homology with other genes or pseudo genes, thus making it difficult to map reads to the genes unambiguously, as observed for three genes listed in Table 4 that appear DE in exon arrays and are classified as non-detected by RNA-seq. This should not be a problem for highly expressed genes, as long as there are enough regions of unique sequence along the length of the gene to make a robust signal, but can be a problem for lowly expressed genes, or genes with repetitive regions. Another reason is that lowly expressed genes will be strongly affected by random sampling – the expression of a gene for which only 10 copies are present in a sample is much less likely to be estimated accurately than a gene for which 1000 copies are present. The obvious solution to this issue with RNA-seq is to increase the sequencing depth, enabling a more accurate estimation of lowly expressed genes in conjunction with the increased detection of differential expression.
Functional analysis of DE genes reveals consistent results across datasets
Although RNA-seq detected a larger number of DE genes, functional enrichment analysis of the microarray and RNA-seq datasets individually using Ingenuity Pathway Analysis revealed qualitatively similar results. In agreement with previous studies in animal models of pain, functional categories/“diseases and disorders” related to immune function/inflammatory response, as well as neurological disease were statistically overrepresented in the respective lists of DE genes [47–49]. This suggests that the genes found exclusively with RNA-seq are likely to be true positives – as they fall into the categories already enriched amongst the overlapping genes. Had these genes arisen as an artefact of RNA-seq, we would expect less coherence in terms of enriched categories in the RNA-seq results, and reduced significance. This is consistent with our canonical pathway analysis where a number of neuron-specific pathways rank highly in the RNA-seq dataset (Figure 8 Cii), because ~10-20% of genes assigned to these pathways are deemed as DE by RNA-seq but not by exon arrays. In terms of discovery of putative pain mediators, RNA-seq has clear advantages over exon arrays as it unravels candidate genes that exon arrays (at the core and extended probeset confidence levels) would fail to identify (e.g. Nos1, Trpv4 and other molecules with established or putative functions in neurotransmission). In contrast, the great majority of putative candidates identified by exon arrays can be confidently identified by RNA-Seq.
In summary, our gene expression datasets accurately reflect the biological mechanisms triggered by peripheral nerve injury as demonstrated in prior pain and injury-related studies. However, it is worthwhile pointing out that our functional analysis is shaped by current literature information used to build the Ingenuity database and may also be incomplete due to lack of gene ontology annotations for a large proportion of genes in both datasets. We expect that the functional information provided by our dataset will become more refined as gene ontology and pathway annotations evolve.
A higher sequencing depth leads to the detection of a greater number of DE genes
We compared three distinct sequencing depths (17 M, 25 M, 50 M reads/sample) in terms of the number of DE genes detected (Figure 5). Increasing RNA-seq read depth leads to the detection of a higher number of DE genes (Figure 5C). A recent study has shown that a high sequencing depth could lead to the false detection of genes that are not expressed in the sample . However, it is unlikely that we have reached such a point; moreover, it is unclear whether reaching such a point would affect DE calculation in an experimental design that takes into account biological variation between replicate samples. In conclusion, for our biological question, we recommend a 50 M read depth in order to obtain a truly comprehensive measure of differential gene expression. However it is important to use biological replicates in order to increase power and ensure results are generalizable to the population level [19, 55].
RNA-seq allows us to profile non-exonic expression
In addition to higher sensitivity and dynamic range, RNA-seq differs from microarrays in its ability to detect expression from areas of the genome regardless of prior annotation, allowing the detection of novel areas of transcription. Such areas, initially dubbed as transcribed “dark matter” , are becoming the focus of much attention and debate, fuelled in part by the recent ENCODE project, which showed that a large proportion of the non-exonic region of the human genome was transcribed in at least one cell line .
Our read mapping statistics suggest active transcription from non-exonic areas in rat DRGs: only 56-58% of the mappable reads aligned with previously annotated exons, 7-9% of the reads aligning to intronic regions (Figure 2C) and 33-35% aligning outside of the boundaries of annotated genes, i.e. intergenic areas. It would be interesting to perform this experiment in another organism used extensively in pain research, such as mouse, in order to investigate how the proportion of non-exonic reads compare. A possible cause for the high proportion of non-exonic reads may be due to yet unknown tissue-specific gene expression in DRG.
Remarkably, we observed a consistent increase in the proportion of reads mapping to regions of the genome annotated as intronic in SNT samples compared to naive samples, from 7 to 9% of the total number of mapped reads. On a gene-by-gene basis, ten times as many genes showed a significant increase in the proportion of reads mapping to intronic regions in SNT samples compared to naive (Additional file 5). Taken together, these data suggest that the SNT procedure is associated with an increase in expression from intronic regions. Such regions may represent novel exons whose inclusion into mature transcripts is induced upon peripheral nerve injury: the transcriptome of injured DRGs may contain novel exons that have not previously been observed in rat tissues, and are therefore not included in the rat transcriptome annotation.
In our alignment procedure, reads aligning within the genomic coordinates of an annotated gene are ascribed to this gene, however it is also possible they belong to novel genes with overlapping genomic coordinates either in the same or opposite strands (something we cannot determine due to lack of strand information in our protocol). Such novel and nested exons may be protein coding, but also correspond to other RNA species such as microRNAs (miRNAs) or long non-coding RNAs. For example, dysregulation of a number of miRNAs had been demonstrated in experimental models of pain, and a growing body of evidence suggests links between miRNA, spinal nerve injury and pain [10, 56–58]. In recent years the study and discovery of long non-coding RNAs has exploded, and at least one such molecule has been shown to have an effect on neuropathic pain .
Clearly, these data need further study in order to prioritise potential genes showing SNT-associated non-exonic expression and to confirm a potential role in pain. Prioritisation might include computational methods, such as enrichment studies of the genes showing intronic expression, or sequence based analysis of the intronic RNA, for example looking for potential complementarity to other genomic regions. Such methods would be greatly aided by further, more focused sequencing experiments that use longer, paired end, stranded reads, as well as other techniques to determine the precise start and end site of these RNA species.
Related to the “agnostic” nature of RNA-seq, allowing it to probe unannotated areas of the genome, another useful aspect of RNA-seq data is that it can constantly be reanalysed in light of new genome builds and updated annotations. This would be far harder to achieve for microarray data, because the proportion of the transcribed genome that can be interrogated is constrained by genome annotation at the time of array design: clearly, any genomic area that is not probed in the array chip cannot be measured. Although it would require some modifications to the protocol used in this study, RNA-seq could also be used to compare the transcriptomes of different strains of rat or other organisms, complementing the work of Sorge et al. , who compared strains of mice at the genome level and correlated differences in the genome with differences in pain sensitivity.
Summary of results
We compared RNA-seq and Affymetrix exon array technologies for the purpose of transcriptional profiling of rat DRG tissue after L5 Spinal Nerve Transection. Our key findings were as follows:
RNA-seq technology is suitable for the transcriptional profiling of experimental models of pain, as it is able to replicate prior microarray studies.
RNA-seq identifies a larger number of DE genes than microarrays, due to its increased sensitivity and higher dynamic range. The number of DE genes identified increases with higher sequencing depth.
RNA-seq detects novel areas of transcription mapping to regions not currently annotated as exons (introns and intergenic regions). Some of these regions are differentially expressed in SNT relative to exonic expression and may represent novel candidate pain mediators.
We have demonstrated that RNA-seq offers major advantages over microarrays for the purpose of whole genome transcriptional profiling of DRG tissue after peripheral nerve injury. Firstly, RNA-seq is more comprehensive as it can interrogate previously unrecognized areas of transcription, while microarray design is constrained to known or predicted transcripts. Secondly, RNA-seq has a much wider dynamic range which favours detection and estimation of differential expression for highly expressed genes and, provided read depth is adequate, lowly expressed genes can also be measured more precisely. Most importantly, RNA-seq provided novel insights into putative novel pain mediators that were not detected using microarrays. We therefore highly recommend the use of RNA-seq for high throughput transcriptional profiling of pain models, and we expect that this technology will supersede microarrays in the near future.
Surgery and tissue collection
Spinal nerve transection (SNT) of the L5 spinal nerve was performed on male Wistar rats (n = 12) as described in . L5 dorsal root ganglia (DRG) were harvested 7 days after surgery by fresh dissection, immediately frozen in liquid nitrogen and stored at -80°C. L5 DRG tissue from naive animals (n = 12) was used as control. Tissue from 4 animals was pooled to create three independent biological replicates per group (SNT or naive) and total RNA was extracted using the miRNEasy kit (QIAGEN, Redwood City, CA, USA) according to manufacturer’s instructions. RNA concentration was measured using the NanoDrop 1000 Spectrophotometer (Thermo Scientific, Wilmington, DE, USA). RNA integrity was assessed using RNA Nano chips in an Agilent 2100 Bionalyzer (Agilent technologies, Santa Clara, CA, USA); RNA integrity numbers (RIN) were between 8.5 and 9.3. Each RNA sample was separated into two technical replicates; one was further processed for microarray analysis and the other for RNA-seq library preparation. Each RNA-seq library was further subdivided into 3 technical replicates, which were sequenced to three distinct read depths as described below.
Microarray analysis and data processing
Microarray analysis was performed using the Affymetrix GeneChip Rat Exon 1.0 ST Array (Affymetrix, Santa Clara, CA, USA). Sample preparation and hybridization were performed by UCL genomics, following Affymetrix instructions. The resultant CEL files were processed in R using the oligo Bioconductor package . Background correction, normalisation and summarisation were performed using robust multi-array average (RMA), quantile normalisation and median polish respectively . Summarisation was performed at the exon level (where each probeset corresponds to an exon, with some exons being probed by more than one probeset) or the transcript cluster/gene level, where all probesets from different exons belonging to the same gene were summarised to produce a single transcript cluster measurement. Throughout this manuscript “normalised probe intensity” is used to refer to expression at the transcript cluster or exon level, following all three pre-processing steps. Transcript clusters were summarised and annotated using the official Affymetrix .ps and .mps data-files, obtained from the NetAffx Analysis Center (http://www.affymetrix.com/analysis/index.affx) through the getNetAffx() function of the oligo package. Presence/Absence calls were determined at the exon level using the detection above background (DABG) method. A default threshold of p = 0.05 for expression above background was used to assign “present” (p < 0.05) and “absent” calls (p > 0.05).
Detection of differentially expressed genes/exons was performed using the limma Bioconductor package . In order to adjust for multiple testing, the false discovery rate (FDR) was derived from the p-values using the method described in . An FDR of 0.1 is accepted when defining significantly differentially expressed genes. Differential expression analysis results for exon arrays, containing fold changes and FDR can be found in Additional file 8. All array data can be found in GEO using accession numbers GSE53764 and GSE53860 for transcript and probe-set level data, respectively. Differential expression data is also available from PainNetworks .
Probeset confidence levels
Rat exon array probesets are annotated with different evidence levels, which indicate the confidence that the probeset truly represents a transcribed genomic sequence, based on the quality of evidence supporting that claim. Probes labelled “core” refer to probesets probing gene transcripts taken from RefSeq and full-length mRNA GenBank records, “extended” probesets are supported by ESTs or partial mRNAs from databases, and “full” probesets are supported by computational predictions. These probeset confidence scores were assigned at the time of array design. Separate gene and exon-level expression matrices were produced, depending on the probe set confidence levels considered, for: 1) core probes only, 2) core and extended probes, and 3) all probes in the chip (core, extended and full, respectively). Core and extended level confidence probes were used when comparing microarrays with RNA-seq. This level of confidence was chosen because it led to the largest number of differentially expressed genes that could be detected with an FDR of 0.1.
cDNA libraries were prepared using the TruSeq™ RNA Sample Preparation Kit (Illumina, San Diego, CA, USA), low throughput protocol: 200 ng of total RNA were subjected to poly(A) enrichment using poly(T)-attached magnetic beads. Poly(A)-enriched RNA was subsequently used for reverse transcription and library preparation according to Illumina’s instructions. Sequencing was performed using the Illumina GAIIx sequencer (Illumina). Each library was sequenced at three distinct read depths (M = million reads/sample): ~17 M (average: 16.6 M; range: 14.7 M-17.8 M), ~25 M (average: 25.5 M reads; range: 23.3 M-27.3 M), and ~50 M (average: 50.7 M; range: 42.8 M-53.7 M). All reads were 34 base pairs in length.
Reads were aligned to the rat genome as summarised in Figure 2. Low quality reads were discarded using the Illumina quality filter, leaving an average 11.9 M (range: 10.7 M- 13.0 M), 19.2 M (17.7 M-20.4 M) and 36.7 M (32.4 M-38.5 M) million reads/sample. Reads were aligned to the reference genome UCSC Rattus norvegicus Rn5 (March 2012) using Bowtie . Up to one mismatch was allowed between the reads and the reference genome. Ambiguously mapping reads (i.e. reads that could be mapped to more than one position in the genome) were discarded. One mismatch was chosen because allowing either 0 or 2 or more mismatches reduced the average percentage of uniquely mapping reads per sample.
Gene expression quantification
Bowtie output files were imported into R using RSamtools . Reads were classified as exonic if they mapped to an annotated exon, intronic if they mapped within the 5’ and 3’ boundaries of a given gene, but outside annotated exons, or intergenic if they aligned outside known 5’ and 3’ boundaries of annotated genes (Figure 2). Gene expression was estimated using the GenomicFeatures package. Gene expression was quantified in one of two ways: considering exonic reads only, or considering intronic and exonic reads (i.e. all reads mapping within the 5’ and 3’ ends of a gene). For genes whose transcripts had alternative start/stop sites, the combination of 5’ and 3’ coordinates that gave maximal coverage, i.e. included all exons of the gene, was selected.
Comparing relative frequencies of read counts between naive and SNT samples
In order to compare the numbers of reads mapping to the exonic, intronic and intergenic regions, the overdispersed logistic regression model of Williams (1982) was used , due to within-group variability being too high to satisfy the assumptions of a simple binomial test.
Normalisation and differential gene expression in RNA-seq
DESeq  was used for normalization of the RNA-seq counts and calculation of differential gene expression. Count data was normalized by estimating effective library size for each sample. As with the microarray analysis, we estimated FDR using the method of Benjamini and Hochberg. An FDR threshold of 0.1 was used to control for false discoveries. Full results for the differential expression analysis for RNA-seq data, including fold changes, p-values and FDR can be found in Additional file 9. RNA-seq alignments (BAM files) can be found in GEO (accession number: GSE53762). Differential expression data is also available from PainNetworks .
Comparison of gene-level expression between platforms
In order to compare microarray gene expression levels to expression measured by RNA-seq, Ensembl (release 69, Rn4) gene annotation was obtained for each microarray transcript-cluster, thus an expression value was obtained for each Ensembl gene probeable using the Affymetrix array. This was plotted against the RNA-seq Reads Per Kilobase per Million mapped reads (RPKM) value for the same Ensembl gene (Ensembl release 74, Rn5). Because the RNA-seq reads were mapped to Rn5, and the microarray annotation was for Rn4, only genes found in both genome builds were plotted. The RPKM is obtained by counting the number of sequenced reads mapping to the exons of a given gene and normalising by the total length of all exons for that gene and the library size . The use of an FDR of 0.1 is somewhat arbitrary (although commonly used). Therefore, in order to justify the use of and FDR of 0.1, we have repeated the comparison of DE genes detected using various FDR thresholds. Additional file 7 shows the overlap in DE genes found by the different platforms at FDRs of 0.05, 0.1, 0.15 and 0.2. We see a similar pattern for all three FDR values – there is a large overlap between platforms, with comparatively few extra genes found by microarrays only, and a larger number of genes found by RNA-seq only.
Comparison of exon-level expression between platforms
When comparing exon expression between platforms, the normalised probe intensity for each microarray probeset that mapped to an Ensembl exon was compared to the number of reads mapping to that exon, normalised using the RPKM procedure described in . In the case of more than one probeset mapping to the same Ensembl gene, the probeset exhibiting the largest variance in expression across all samples was used for the comparison, as used in . In order to compare expression at the exon level directly between technologies the RNA-seq alignments considered for this particular analysis were made using the Rn4 version of the genome.
Changes in non-exonic expression following SNT
In order to identify genes for which non-exonic expression changed significantly following SNT, gene expression calculated using exonic expression was compared to gene expression calculated using exonic and intronic expression as described above. Nested genes were not used for this process – any gene that overlapped with any other gene (on either strand) was excluded. In order to look for genes showing a significant increase or decrease in the proportion of reads aligning to intronic gene regions, we used the DEXSeq package . We considered each gene to consist of two units: exons and introns. The package is then employed to look for genes that show a difference in the relative ratios of exonically and intronically aligned reads between SNT and naive samples. Genes were normalised and dispersion estimated using the standard parameters employed by DEXseq. A count based filter was applied before analysis: any gene with less than 200 reads aligning to its intronic regions for more than 3 samples was excluded from the analysis.
For functional enrichment analysis the exon array and the RNA-seq (read depth 50 M) datasets were considered separately. Lists of dysregulated genes (FDR p < 0.1) were subjected to gene ontology analysis. Analysis by “Protein Class” was performed with “PANTHER classification systems” (http://www.pantherdb.org/) . Functional analysis was performed with Ingenuity Pathway Analysis (IPA, QIAGEN). Lists of dysregulated genes were subjected to “core analysis” using IPA default settings, and top biological functions/ “diseases and disorders” as well as “canonical pathways”; p-values were adjusted using the Benjamini-Hochberg multiple testing correction.
Raw and processed data are available from the Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/), series accession number: GSE53861. Tables of gene expression and the lists of the DE genes are available from http://www.PainNetworks.org  from the experiments tab.
Detection above background
Dorsal root ganglia
False discovery rate
Reads per kilobase per million mapped reads
L5 spinal nerve transection.
Maratou K, Wallace VCJ, Hasnie FS, Okuse K, Hosseini R, Jina N, Blackbeard J, Pheby T, Orengo C, Dickenson AH, McMahon SB, Rice ASC: Comparison of dorsal root ganglion gene expression in rat models of traumatic and HIV-associated neuropathic pain. Eur J Pain 2009, 13: 387–398. 10.1016/j.ejpain.2008.05.011
Costigan M, Belfer I, Griffin RS, Dai F, Barrett LB, Coppola G, Wu T, Kiselycznyk C, Poddar M, Lu Y, Diatchenko L, Smith S, Cobos EJ, Zaykin D, Allchorne A, Gershon E, Livneh J, Shen P-H, Nikolajsen L, Karppinen J, Männikkö M, Kelempisioti A, Goldman D, Maixner W, Geschwind DH, Max MB, Seltzer Z, Woolf CJ: Multiple chronic pain states are associated with a common amino acid-changing allele in KCNS1. Brain 2010, 133: 2519–2527. 10.1093/brain/awq195
Tsantoulas C, Zhu L, Shaifta Y, Grist J, Ward JPT, Raouf R, Michael GJ, McMahon SB: Sensory neuron downregulation of the Kv9.1 potassium channel subunit mediates neuropathic pain following nerve injury. J Neurosci 2012, 32: 17502–17513. 10.1523/JNEUROSCI.3561-12.2012
LaCroix-Fralish ML, Austin J-S, Zheng FY, Levitin DJ, Mogil JS: Patterns of pain: meta-analysis of microarray studies of pain. Pain 2011, 152: 1888–1898. 10.1016/j.pain.2011.04.014
Antunes-Martins A, Perkins JR, Lees J, Hildebrandt T, Orengo C, Bennett DLH: Systems biology approaches to finding novel pain mediators. Wiley Interdiscip Rev Syst Biol Med 2013, 5: 11–35. 10.1002/wsbm.1192
Tegeder I, Costigan M, Griffin RS, Abele A, Belfer I, Schmidt H, Ehnert C, Nejim J, Marian C, Scholz J, Wu T, Allchorne A, Diatchenko L, Binshtok AM, Goldman D, Adolph J, Sama S, Atlas SJ, Carlezon WA, Parsegian A, Lötsch J, Fillingim RB, Maixner W, Geisslinger G, Max MB, Woolf CJ: GTP cyclohydrolase and tetrahydrobiopterin regulate pain sensitivity and persistence. Nat Med 2006, 12: 1269–1277. 10.1038/nm1490
Moss A, Ingram R, Koch S, Theodorou A, Low L, Baccei M, Hathway GJ, Costigan M, Salton SR, Fitzgerald M: Origins, actions and dynamic expression patterns of the neuropeptide VGF in rat peripheral and central sensory neurones following peripheral nerve injury. Mol Pain 2008, 4: 62. 10.1186/1744-8069-4-62
Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 2009, 10: 57–63. 10.1038/nrg2484
Hammer P, Banck MS, Amberg R, Wang C, Petznick G, Luo S, Khrebtukova I, Schroth GP, Beyerlein P, Beutler AS: mRNA-seq with agnostic splice site discovery for nervous system transcriptomics tested in chronic pain. Genome Res 2010, 20: 847–860. 10.1101/gr.101204.109
Yu B, Zhou S, Wang Y, Ding G, Ding F, Gu X: Profile of microRNAs following rat sciatic nerve injury by deep sequencing: implication for mechanisms of nerve regeneration. PLoS ONE 2011, 6: e24612. 10.1371/journal.pone.0024612
Alvarado S, Tajerian M, Millecamps M, Suderman M, Stone LS, Szyf M: Peripheral nerve injury is accompanied by chronic transcriptome-wide changes in the mouse prefrontal cortex. Mol Pain 2013, 9: 21. 10.1186/1744-8069-9-21
Butte A: The use and analysis of microarray data. Nat Rev Drug Discov 2002, 1: 951–960. 10.1038/nrd961
Binder H, Preibisch S: Specific and nonspecific hybridization of oligonucleotide probes on microarrays. Biophys J 2005, 89: 337–352. 10.1529/biophysj.104.055343
Mardis ER: Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet 2008, 9: 387–402. 10.1146/annurev.genom.9.081307.164359
Birzele F, Schaub J, Rust W, Clemens C, Baum P, Kaufmann H, Weith A, Schulz TW, Hildebrandt T: Into the unknown: expression profiling without genome sequence information in CHO by next generation sequencing. Nucleic Acids Res 2010, 38: 3999–4010. 10.1093/nar/gkq116
Costa V, Aprile M, Esposito R, Ciccodicola A: RNA-Seq and human complex diseases: recent accomplishments and future perspectives. Eur J Hum Genet 2013, 21: 134–142. 10.1038/ejhg.2012.129
Oshlack A, Robinson MD, Young MD: From RNA-seq reads to differential expression results. Genome Biol 2010, 11: 220. 10.1186/gb-2010-11-12-220
Cai G, Li H, Lu Y, Huang X, Lee J, Müller P, Ji Y, Liang S: Accuracy of RNA-Seq and its dependence on sequencing depth. BMC Bioinforma 2012,13(Suppl 13):S5. 10.1186/1471-2105-13-S13-S5
Anders S, Huber W: Differential expression analysis for sequence count data. Genome Biol 2010, 11: R106. 10.1186/gb-2010-11-10-r106
Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 2008, 18: 1509–1517. 10.1101/gr.079558.108
Bradford JR, Hey Y, Yates T, Li Y, Pepper SD, Miller CJ: A comparison of massively parallel nucleotide sequencing with oligonucleotide microarrays for global transcription profiling. BMC Genomics 2010, 11: 282. 10.1186/1471-2164-11-282
Bottomly D, Walter NAR, Hunter JE, Darakjian P, Kawane S, Buck KJ, Searles RP, Mooney M, McWeeney SK, Hitzemann R: Evaluating gene expression in C57BL/6 J and DBA/2 J mouse striatum using RNA-Seq and microarrays. PLoS ONE 2011, 6: e17820. 10.1371/journal.pone.0017820
Toung JM, Morley M, Li M, Cheung VG: RNA-sequence analysis of human B-cells. Genome Res 2011, 21: 991–998. 10.1101/gr.116335.110
Su Z, Li Z, Chen T, Li Q-Z, Fang H, Ding D, Ge W, Ning B, Hong H, Perkins RG, Tong W, Shi L: Comparing next-generation sequencing and microarray technologies in a toxicological study of the effects of aristolochic acid on rat kidneys. Chem Res Toxicol 2011, 24: 1486–1493. 10.1021/tx200103b
Fu X, Fu N, Guo S, Yan Z, Xu Y, Hu H, Menzel C, Chen W, Li Y, Zeng R, Khaitovich P: Estimating accuracy of RNA-Seq and microarrays with proteomics. BMC Genomics 2009, 10: 161. 10.1186/1471-2164-10-161
Griffith M, Griffith OL, Mwenifumbo J, Goya R, Morrissy AS, Morin RD, Corbett R, Tang MJ, Hou Y-C, Pugh TJ, Robertson G, Chittaranjan S, Ally A, Asano JK, Chan SY, Li HI, McDonald H, Teague K, Zhao Y, Zeng T, Delaney A, Hirst M, Morin GB, Jones SJM, Tai IT, Marra MA: Alternative expression analysis by RNA sequencing. Nat Methods 2010, 7: 843–847. 10.1038/nmeth.1503
Bullard JH, Purdom E, Hansen KD, Dudoit S: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinforma 2010, 11: 94. 10.1186/1471-2105-11-94
Kogenaru S, Qing Y, Guo Y, Wang N: RNA-seq and microarray complement each other in transcriptome profiling. BMC Genomics 2012, 13: 629. 10.1186/1471-2164-13-629
Sîrbu A, Kerr G, Crane M, Ruskin HJ: RNA-Seq vs dual- and single-channel microarray data: sensitivity analysis for differential expression and clustering. PLoS ONE 2012, 7: e50986. 10.1371/journal.pone.0050986
Sekhon RS, Briskine R, Hirsch CN, Myers CL, Springer NM, Buell CR, de Leon N, Kaeppler SM: Maize gene atlas developed by RNA sequencing and comparative evaluation of transcriptomes based on RNA sequencing and microarrays. PLoS ONE 2013, 8: e61005.
Mooney M, Bond J, Monks N, Eugster E, Cherba D, Berlinski P, Kamerling S, Marotti K, Simpson H, Rusk T, Tembe W, Legendre C, Benson H, Liang W, Webb CP: Comparative RNA-Seq and microarray analysis of gene expression changes in B-cell lymphomas of Canis familiaris. PLoS ONE 2013, 8: e61088. 10.1371/journal.pone.0061088
Malone JH, Oliver B: Microarrays, deep sequencing and the true measure of the transcriptome. BMC Biol 2011, 9: 34. 10.1186/1741-7007-9-34
Calvo M, Zhu N, Grist J, Ma Z, Loeb JA, Bennett DLH: Following nerve injury neuregulin-1 drives microglial proliferation and neuropathic pain via the MEK/ERK pathway. Glia 2011, 59: 554–568. 10.1002/glia.21124
Costigan M, Befort K, Karchewski L, Griffin RS, D’Urso D, Allchorne A, Sitarski J, Mannion JW, Pratt RE, Woolf CJ: Replicate high-density rat genome oligonucleotide microarrays reveal hundreds of regulated genes in the dorsal root ganglion after peripheral nerve injury. BMC Neurosci 2002, 3: 16. 10.1186/1471-2202-3-16
Komori N, Takemori N, Kim HK, Singh A, Hwang S-H, Foreman RD, Chung K, Chung JM, Matsumoto H: Proteomics study of neuropathic and nonneuropathic dorsal root ganglia: altered protein regulation following segmental spinal nerve ligation injury. Physiol Genomics 2007, 29: 215–230.
Calvo M, Zhu N, Tsantoulas C, Ma Z, Grist J, Loeb JA, Bennett DLH: Neuregulin-ErbB signaling promotes microglial proliferation and chemotaxis contributing to microgliosis and pain after peripheral nerve injury. J Neurosci 2010, 30: 5437–5450. 10.1523/JNEUROSCI.5169-09.2010
Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis CA, Doyle F, Epstein CB, Frietze S, Harrow J, Kaul R, Khatun J, Lajoie BR, Landt SG, Lee B-K, Pauli F, Rosenbloom KR, Sabo P, Safi A, Sanyal A, Shoresh N, Simon JM, Song L, Trinklein ND, Altshuler RC, Birney E, Brown JB, Cheng C, Djebali S, Dong X, Dunham I, et al.: An integrated encyclopedia of DNA elements in the human genome. Nature 2012, 489: 57–74. 10.1038/nature11247
Belgard TG, Marques AC, Oliver PL, Abaan HO, Sirey TM, Hoerder-Suabedissen A, García-Moreno F, Molnár Z, Margulies EH, Ponting CP: A transcriptomic atlas of mouse neocortical layers. Neuron 2011, 71: 605–616. 10.1016/j.neuron.2011.06.039
Ponting CP, Belgard TG: Transcribed dark matter: meaning or myth? Hum Mol Genet 2010, 19: R162–168. 10.1093/hmg/ddq362
Ramos AD, Diaz A, Nellore A, Delgado RN, Park K-Y, Gonzales-Roybal G, Oldham MC, Song JS, Lim DA: Integration of Genome-wide Approaches Identifies lncRNAs of Adult Neural Stem Cells and Their Progeny In Vivo. Cell Stem Cell 2013,12(5):616–628. 10.1016/j.stem.2013.03.003
Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Lagarde J, Lin W, Schlesinger F, Xue C, Marinov GK, Khatun J, Williams BA, Zaleski C, Rozowsky J, Röder M, Kokocinski F, Abdelhamid RF, Alioto T, Antoshechkin I, Baer MT, Bar NS, Batut P, Bell K, Bell I, Chakrabortty S, Chen X, Chrast J, Curado J, et al.: Landscape of transcription in human cells. Nature 2012, 489: 101–108. 10.1038/nature11233
Kapranov P, St Laurent G: Dark Matter RNA: Existence, Function, and Controversy. Front Genet 2012, 3: 60.
Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L, Gordon L, Hendrix M, Hourlier T, Johnson N, Kähäri AK, Keefe D, Keenan S, Kinsella R, Komorowska M, Koscielny G, Kulesha E, Larsson P, Longden I, McLaren W, Muffato M, Overduin B, Pignatelli M, Pritchard B, Riat HS, et al.: Ensembl 2012. Nucleic Acids Res 2012,40(Database issue):D84–90.
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008, 5: 621–628. 10.1038/nmeth.1226
Anders S, Reyes A, Huber W: Detecting differential usage of exons from RNA-seq data. Genome Res 2012, 22: 2008–2017. 10.1101/gr.133744.111
Kung JTY, Colognori D, Lee JT: Long noncoding RNAs: past, present, and future. Genetics 2013, 193: 651–669. 10.1534/genetics.112.146704
Costigan M, Moss A, Latremoliere A, Johnston C, Verma-Gandhu M, Herbert TA, Barrett L, Brenner GJ, Vardeh D, Woolf CJ, Fitzgerald M: T-cell infiltration and signaling in the adult dorsal spinal cord is a major contributor to neuropathic pain-like hypersensitivity. J Neurosci 2009, 29: 14415–14422. 10.1523/JNEUROSCI.4569-09.2009
Kim D-S, Figueroa KW, Li K-W, Boroujerdi A, Yolo T, Luo ZD: Profiling of dynamically changed gene expression in dorsal root ganglia post peripheral nerve injury and a critical role of injury-induced glial fibrillary acidic protein in maintenance of pain behaviors [corrected]. Pain 2009, 143: 114–122. 10.1016/j.pain.2009.02.006
Griffin RS, Costigan M, Brenner GJ, Ma CHE, Scholz J, Moss A, Allchorne AJ, Stahl GL, Woolf CJ: Complement induction in spinal cord microglia results in anaphylatoxin C5a-mediated pain hypersensitivity. J Neurosci 2007, 27: 8699–8708. 10.1523/JNEUROSCI.2018-07.2007
Alessandri-Haber N, Dina OA, Yeh JJ, Parada CA, Reichling DB, Levine JD: Transient receptor potential vanilloid 4 is essential in chemotherapy-induced neuropathic pain in the rat. J Neurosci 2004, 24: 4444–4452. 10.1523/JNEUROSCI.0242-04.2004
Smyth GK: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004., 3: Article 3. doi:10.2202/1544–6115.1027
Soneson C, Delorenzi M: A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinforma 2013, 14: 91. 10.1186/1471-2105-14-91
Zhang S, Cao J: A close examination of double filtering with fold change and T test in microarray analysis. BMC Bioinforma 2009, 10: 402. 10.1186/1471-2105-10-402
Tarazona S, García-Alcalde F, Dopazo J, Ferrer A, Conesa A: Differential expression in RNA-seq: a matter of depth. Genome Res 2011, 21: 2213–2223. 10.1101/gr.124321.111
Hansen KD, Wu Z, Irizarry RA, Leek JT: Sequencing technology does not eliminate biological variability. Nat Biotechnol 2011, 29: 572–573. 10.1038/nbt.1910
Orlova IA, Alexander GM, Qureshi RA, Sacan A, Graziano A, Barrett JE, Schwartzman RJ, Ajit SK: MicroRNA modulation in complex regional pain syndrome. J Transl Med 2011, 9: 195. 10.1186/1479-5876-9-195
Von Schack D, Agostino MJ, Murray BS, Li Y, Reddy PS, Chen J, Choe SE, Strassle BW, Li C, Bates B, Zhang L, Hu H, Kotnis S, Bingham B, Liu W, Whiteside GT, Samad TA, Kennedy JD, Ajit SK: Dynamic changes in the microRNA expression profile reveal multiple regulatory mechanisms in the spinal nerve ligation model of neuropathic pain. PLoS ONE 2011, 6: e17670. 10.1371/journal.pone.0017670
Zhao J, Lee M-C, Momin A, Cendan C-M, Shepherd ST, Baker MD, Asante C, Bee L, Bethry A, Perkins JR, Nassar MA, Abrahamsen B, Dickenson A, Cobb BS, Merkenschlager M, Wood JN: Small RNAs control sodium channel expression, nociceptor excitability, and pain thresholds. J Neurosci 2010, 30: 10860–10871. 10.1523/JNEUROSCI.1980-10.2010
Zhao X, Tang Z, Zhang H, Atianjoh FE, Zhao J-Y, Liang L, Wang W, Guan X, Kao S-C, Tiwari V, Gao Y-J, Hoffman PN, Cui H, Li M, Dong X, Tao Y-X: A long noncoding RNA contributes to neuropathic pain by silencing Kcna2 in primary afferent neurons. Nat Neurosci 2013, 16: 1024–1031. 10.1038/nn.3438
Sorge RE, Trang T, Dorfman R, Smith SB, Beggs S, Ritchie J, Austin J-S, Zaykin DV, Vander Meulen H, Costigan M, Herbert TA, Yarkoni-Abitbul M, Tichauer D, Livneh J, Gershon E, Zheng M, Tan K, John SL, Slade GD, Jordan J, Woolf CJ, Peltz G, Maixner W, Diatchenko L, Seltzer Z, Salter MW, Mogil JS: Genetically determined P2X7 receptor pore formation regulates variability in chronic pain sensitivity. Nat Med 2012, 18: 595–599. 10.1038/nm.2710
Carvalho BS, Irizarry RA: A framework for oligonucleotide microarray preprocessing. Bioinformatics 2010, 26: 2363–2367. 10.1093/bioinformatics/btq431
Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 2003, 31: e15. 10.1093/nar/gng015
Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 1995, 57: 289–300.
Perkins JR, Lees J, Antunes-Martins A, Diboun I, McMahon SB, Bennett DLH, Orengo C: PainNetworks: A web-based resource for the visualisation of pain-related genes in the context of their network associations. Pain 2013, 154: 2586.e1–2586.e12. 10.1016/j.pain.2013.09.003
Langmead B, Trapnell C, Pop M, Salzberg S: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 2009, 10: R25–10. 10.1186/gb-2009-10-3-r25
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: 1000 Genome Project Data Processing Subgroup: The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25: 2078–2079. 10.1093/bioinformatics/btp352
Williams DA: Extra-binomial variation in logistic linear models. Appl Stat 1982,31(2):144–148. 10.2307/2347977
Hahne F, Huber W, Gentleman R, Falcon S: Bioconductor Case Studies. New York, NY: Springer; 2008. [Use R!]
Mi H, Muruganujan A, Thomas PD: PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucl. Acids Res. 2012,41(D1):D377-D386. doi: 10.1093/nar/gks1118
The authors would like to acknowledge Simone Sharma and Priya Banerjee (UCL Genomics) for microarray sample processing and hybridization (as part of collaborative work). The research leading to these results is part of the Europain Collaboration, which has received support from the Innovative Medicines Initiative Joint Undertaking (http://www.imi.europa.eu), under grant agreement no 115007, resources of which are composed of financial contribution from the European Union's Seventh Framework Programme (FP7/2007‒2013) and EFPIA companies’ in kind contribution. JRP, CO, DLHB and SBM are part of the Wellcome Trust-funded London Pain Consortium. DLHB is a senior Wellcome Clinical Scientist.
The authors declare that they have no competing interests.
JRP, MK and AAM performed data analysis. AAM performed RNA sample preparation. MC and JG performed SNT surgery. WR and RS performed RNA-seq. DLHB, SBM and CO conceived of and managed this study. All authors read and approved the final manuscript.
James R Perkins, Ana Antunes-Martins contributed equally to this work.
Electronic supplementary material
Additional file 1: Inter-platform correlation at the gene level. Each pdf file in this zipped folder contains plots of RNA-seq RPKM expression vs. microarray normalised probe intensity for all respective samples. Spearman’s correlation coefficient is indicated in the top left corner of each graph. (ZIP 5 MB)
Additional file 2: The distributions of absolute log 2 FCs for DE genes, shown alongside the absolute log 2 FCs for non-DE genes. Absolute log 2 FCs are shown for the genes that are called as DE by both platforms (red lines, dashed and solid lines show RNA-seq and microarray fold changes), by RNA-seq only (dashed green line) and by microarrays only (solid blue line). Non DE genes shown in grey (dashed line shows RNA-seq values, solid line represents microarray values). Distribution curve computed using the probability density function, implemented in R. (PDF 17 KB)
Additional file 3: Comparison of RNA-seq and microarrays for the measurement of exon expression and the detection of differentially expressed exons. A) Correlation between normalised hybridisation intensity and normalized read counts (RPKM) at a 50 M read depth for exons measureable using microarrays and RNA-seq. Where more than one probeset maps to a given exon, both values are plotted, as separate points, for the equivalent RNA-seq value for that exon. Ai) Average expression for all three SNT samples. Aii) Average expression for all three naive samples. The red points show exons expressed in both platforms, blue points show exons that are not detected by RNA-seq (i.e. 0 reads aligned to that exon). Green points show exons with microarray normalised probe intensity below that of the background probesets (calculated using the DABG measure described in the Methods section), but with an RNA-seq RPKM value above 0. Grey points show exons with microarray normalised probe intensity below that of background probesets, and with an RPKM of 0. Some noise has been added to the expression values of the exons for clearer visualization of the point density. B) Correlation between fold changes estimated by microarrays and RNA-seq (50 M read depth) for exons detectable by both technologies. Exons deemed as significantly DE by both platforms are shown as red points; exons detected as DE exclusively by RNA-Seq are shown as green points; exons detected as DE exclusively by microarrays are shown as blue points. C) Venn diagram showing the number of exons found to be differentially expressed by RNA-seq (shown for a read depth of 50 M) and the overlap with microarray data. (PDF 204 KB)
Additional file 4: Overlap between platforms at the exon level, for all sequencing depths. Number of exons called as DE for RNA-seq, microarrays and the overlap between them. For lower sequencing depths, microarrays call more exons as DE. (XLS 26 KB)
Additional file 5: Fold change at exonic and intronic levels and p-values. Table containing fold changes calculated by DEXseq at intronic and exonic level and DEXSeq p-values for all genes tested. (XLS 628 KB)
Additional file 6: Volcano and mean-fold change plot for the DEXSeq-based analysis of relative exonic vs. intronic expression. A) Volcano plot, which shows the logarithm of the change in exonic expression minus the change in intronic expression, following SNT (x-axis). This is plotted against the negative logarithm of the p-value (y-axis). We can see that the most significant genes (i.e. those with the highest value on the y-axis) are represented by points with a negative value on the x-axis; this suggests that the most affected genes in terms of intronic vs. exonic expression are showing an increased expression in intronic regionsfollowing SNT. B) Plot of mean intronic expression vs. the logarithm of the change in exonic expression minus the change in intronic expression, highlighting the genes that have been deemed significant (FDR < 0.1), showing that significance is not a function of expression. (PDF 29 KB)
Additional file 7: The effect of changing the permitted false discovery rate, on the total number of genes deemed as differentially expressed. Numbers of genes called as significantly DE for RNA-seq, microarrays and the overlap between them for varying FDRs. Ensembl gene ids and gene symbols are given. (XLS 32 KB)
Additional file 8: Exon array limma analysis, containing the Ensembl gene ids and gene symbols (obtained from NetAffx and Biomart). Results produced using the limma package. In the case of more than one transcript cluster id with the same Ensembl id, the transcript cluster showing the highest level of variation across samples was used for the limma analysis. Results are shown for extended level confidence probes. (CSV 2 MB)
Additional file 9: RNA-seq (50 M) DESeq results containing the Ensembl gene ids and gene symbols (obtained from NetAffx and Biomart). Results produced using the DESeq package, using the default normalization parameters. Genes to which no reads could be aligned for four or more samples were excluded from analysis. (CSV 2 MB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
About this article
Cite this article
Perkins, J.R., Antunes-Martins, A., Calvo, M. et al. A comparison of RNA-seq and exon arrays for whole genome transcription profiling of the L5 spinal nerve transection model of neuropathic pain in the rat. Mol Pain 10, 7 (2014). https://doi.org/10.1186/1744-8069-10-7
- Whole-genome transcription profiling
- Exon arrays
- Next generation sequencing
- Spinal nerve transection
- Nerve injury
- Neuropathic pain
- Differential gene expression