NACHO (NAnostring quality Control dasHbOard) is developed for NanoString nCounter data.
NanoString nCounter data is a messenger-RNA/micro-RNA (mRNA/miRNA) expression assay and works with fluorescent barcodes.
Each barcode is assigned a mRNA/miRNA, which can be counted after bonding with its target.
As a result each count of a specific barcode represents the presence of its target mRNA/miRNA.
NACHO is able to load, visualise and normalise the exported NanoString nCounter data and facilitates the user in performing a quality control.
NACHO does this by visualising quality control metrics, expression of control genes, principal components and sample specific size factors in an interactive web application.
With the use of two functions, RCC files are summarised and visualised, namely: load_rcc()
and visualise()
.
load_rcc()
function is used to preprocess the data.visualise()
function initiates a Shiny-based dashboard that visualises all relevant QC plots.NACHO also includes a function normalise()
, which (re)calculates sample specific size factors and normalises the data.
normalise()
function creates a list in which your settings, the raw counts and normalised counts are stored.In addition (since v0.6.0) NACHO includes two (three) additional functions:
render()
function renders a full quality-control report (HTML) based on the results of a call to load_rcc()
or normalise()
(using print()
in a Rmarkdown chunk).autoplot()
function draws any quality-control metrics from visualise()
and render()
.For more vignette("NACHO")
and vignette("NACHO-analysis")
.
Canouil M, Bouland GA, Bonnefond A, Froguel P, Hart L, Slieker R (2019). “NACHO: an R package for quality control of NanoString nCounter data.” Bioinformatics. ISSN 1367-4803, doi: 10.1093/bioinformatics/btz647.
@Article{,
title = {{NACHO}: an {R} package for quality control of {NanoString} {nCounter} data},
author = {Mickaël Canouil and Gerard A. Bouland and Amélie Bonnefond and Philippe Froguel and Leen Hart and Roderick Slieker},
journal = {Bioinformatics},
address = {Oxford, England},
year = {2019},
month = {aug},
issn = {1367-4803},
doi = {10.1093/bioinformatics/btz647},
}
To display the usage and utility of NACHO, we show three examples in which the above mentioned functions are used and the results are briefly examined.
NACHO comes with presummarised data and in the first example we use this dataset to call the interactive web application using visualise()
.
In the second example, we show the process of going from raw RCC files to visualisations with a dataset queried from GEO using GEOquery
.
In the third example, we use the summarised dataset from the second example to calculate the sample specific size factors using normalise()
and its added functionality to predict housekeeping genes.
Besides creating interactive visualisations, NACHO also identifies poorly performing samples which can be seen under the Outlier Table tab in the interactive web application.
While calling normalise()
, the user has the possibility to remove these outliers before size factor calculation.
This example shows how to use summarised data to call the interactive web application.
The raw data used is from a study of Liu et al. (2016) and was acquired from the NCBI GEO public database (Barrett et al. 2013).
Numerous NanoString nCounter datasets are available from GEO (Barrett et al. 2013).
In this example, we use a mRNA dataset from the study of Bruce et al. (2015) with the GEO accession number: GSE70970. The data is extracted and prepared using the following code.
library(GEOquery)
# Download data
gse <- getGEO("GSE70970")
# Get phenotypes
targets <- pData(phenoData(gse[[1]]))
getGEOSuppFiles(GEO = "GSE70970", baseDir = tempdir())
# Unzip data
untar(
tarfile = file.path(tempdir(), "GSE70970", "GSE70970_RAW.tar"),
exdir = file.path(tempdir(), "GSE70970", "Data")
)
# Add IDs
targets$IDFILE <- list.files(file.path(tempdir(), "GSE70970", "Data"))
## # A tibble: 263 x 71
## IDFILE title geo_accession status submission_date last_update_date type
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 GSM18… NPC-… GSM1824143 Publi… Jul 15 2015 Jul 20 2015 RNA
## 2 GSM18… NPC-… GSM1824144 Publi… Jul 15 2015 Jul 20 2015 RNA
## 3 GSM18… NPC-… GSM1824145 Publi… Jul 15 2015 Jul 20 2015 RNA
## 4 GSM18… NPC-… GSM1824146 Publi… Jul 15 2015 Jul 20 2015 RNA
## 5 GSM18… NPC-… GSM1824147 Publi… Jul 15 2015 Jul 20 2015 RNA
## 6 GSM18… NPC-… GSM1824148 Publi… Jul 15 2015 Jul 20 2015 RNA
## 7 GSM18… NPC-… GSM1824149 Publi… Jul 15 2015 Jul 20 2015 RNA
## 8 GSM18… NPC-… GSM1824150 Publi… Jul 15 2015 Jul 20 2015 RNA
## 9 GSM18… NPC-… GSM1824151 Publi… Jul 15 2015 Jul 20 2015 RNA
## 10 GSM18… NPC-… GSM1824152 Publi… Jul 15 2015 Jul 20 2015 RNA
## # … with 253 more rows, and 64 more variables: channel_count <chr>,
## # source_name_ch1 <chr>, organism_ch1 <chr>, characteristics_ch1 <chr>,
## # characteristics_ch1.1 <chr>, characteristics_ch1.2 <chr>,
## # characteristics_ch1.3 <chr>, characteristics_ch1.4 <chr>,
## # characteristics_ch1.5 <chr>, characteristics_ch1.6 <chr>,
## # characteristics_ch1.7 <chr>, characteristics_ch1.8 <chr>,
## # characteristics_ch1.9 <chr>, characteristics_ch1.10 <chr>,
## # characteristics_ch1.11 <chr>, characteristics_ch1.12 <chr>,
## # characteristics_ch1.13 <chr>, characteristics_ch1.14 <chr>,
## # characteristics_ch1.15 <chr>, characteristics_ch1.16 <chr>,
## # characteristics_ch1.17 <chr>, characteristics_ch1.18 <chr>,
## # characteristics_ch1.19 <chr>, treatment_protocol_ch1 <chr>,
## # growth_protocol_ch1 <chr>, molecule_ch1 <chr>, extract_protocol_ch1 <chr>,
## # label_ch1 <chr>, label_protocol_ch1 <chr>, taxid_ch1 <chr>,
## # hyb_protocol <chr>, scan_protocol <chr>, data_processing <chr>,
## # platform_id <chr>, contact_name <chr>, contact_email <chr>,
## # contact_institute <chr>, contact_address <chr>, contact_city <chr>,
## # contact_state <chr>, `contact_zip/postal_code` <chr>,
## # contact_country <chr>, supplementary_file <chr>, data_row_count <chr>,
## # `age:ch1` <chr>, `bin.t:ch1` <chr>, `chemo:ch1` <chr>,
## # `disease.event:ch1` <chr>, `disease.spec.event:ch1` <chr>,
## # `disease.spec.time:ch1` <chr>, `disease.time:ch1` <chr>,
## # `distant.event:ch1` <chr>, `distant.time:ch1` <chr>, `gender:ch1` <chr>,
## # `local.event:ch1` <chr>, `local.regional.event:ch1` <chr>,
## # `local.regional.time:ch1` <chr>, `local.time:ch1` <chr>, `n:ch1` <chr>,
## # `nodal.event:ch1` <chr>, `nodal.time:ch1` <chr>,
## # `survival.event:ch1` <chr>, `survival.time:ch1` <chr>, `t:ch1` <chr>
After we extracted the dataset to the /tmp/RtmpVNOmbj/GSE70970/Data
directory, a Samplesheet.csv
containing a column with the exact names of the files for each sample can be written or use as is.
load_rcc()
functionThe first argument requires the path to the directory containing the RCC files, the second argument is the location of samplesheet followed by third argument with the column name containing the exact names of the files.
The housekeeping_genes
and normalisation_method
arguments respectively indicate which housekeeping genes and normalisation method should be used.
GSE70970_sum <- load_rcc(
data_directory = file.path(tempdir(), "GSE70970", "Data"), # Where the data is
ssheet_csv = targets, # The samplesheet
id_colname = "IDFILE", # Name of the column that contains the unique identfiers
housekeeping_genes = NULL, # Custom list of housekeeping genes
housekeeping_predict = TRUE, # Whether or not to predict the housekeeping genes
normalisation_method = "GEO", # Geometric mean or GLM
n_comp = 5 # Number indicating how many principal components should be computed.
)
## [NACHO] Importing RCC files.
## [NACHO] Performing QC and formatting data.
## [NACHO] Searching for the best housekeeping genes.
## [NACHO] Computing normalisation factors using "GEO" method for housekeeping genes prediction.
## [NACHO] The following predicted housekeeping genes will be used for normalisation:
## - hsa-miR-103
## - hsa-let-7e
## - hsa-miR-1260
## - hsa-miR-500+hsa-miR-501-5p
## - hsa-miR-1274b
## [NACHO] Computing normalisation factors using "GEO" method.
## [NACHO] Missing values have been replaced with zeros for PCA.
## [NACHO] Normalising data using "GEO" method with housekeeping genes.
## [NACHO] Returning a list.
## $ access : character
## $ housekeeping_genes : character
## $ housekeeping_predict: logical
## $ housekeeping_norm : logical
## $ normalisation_method: character
## $ remove_outliers : logical
## $ n_comp : numeric
## $ data_directory : character
## $ pc_sum : data.frame
## $ nacho : data.frame
## $ outliers_thresholds : list
visualise()
functionWhen the summarisation is done, the summarised (or normalised) data can be visualised using the visualise()
function as can be seen in the following chunk of code.
The sidebar includes widgets to control quality-control thresholds. These widgets differ according to the selected tab. Each sample in the plots can be coloured based on either technical specifications which are included in the RCC files or based on specifications of your own choosing, though these specifications need to be included in the samplesheet.
normalise()
functionNACHO allows the discovery of housekeeping genes within your own dataset. NACHO finds the five best suitable housekeeping genes, however, it is possible that one of these five genes might not be suitable, which is why a subset of these discovered housekeeping genes might work better in some cases. For this example, we use the GSE70970 dataset from the previous example. The discovered housekeeping genes are saved in the result object as predicted_housekeeping.
print(GSE70970_sum[["housekeeping_genes"]])
## [1] "hsa-miR-103" "hsa-let-7e"
## [3] "hsa-miR-1260" "hsa-miR-500+hsa-miR-501-5p"
## [5] "hsa-miR-1274b"
Let’s say hsa-miR-103 and hsa-let-7e are not suitable, therefore, you want to exclude these genes from the normalisation process.
my_housekeeping <- GSE70970_sum[["housekeeping_genes"]][-c(1, 2)]
print(my_housekeeping)
## [1] "hsa-miR-1260" "hsa-miR-500+hsa-miR-501-5p"
## [3] "hsa-miR-1274b"
The next step is the actual normalisation. The first argument requires the summary which is created with the load_rcc()
function. The second argument requires a vector of gene names. In this case, it is a subset of the discovered housekeeping genes we just made. With the third argument the user has the choice to remove the outliers. Lastly, the normalisation method can be choosed.
Here, the user has a choice between "GLM"
or "GEO"
. The differences between normalisation methods are nuanced, however, a preference for either method are use case specific.
In this example, "GLM"
is used.
GSE70970_norm <- normalise(
nacho_object = GSE70970_sum,
housekeeping_genes = my_housekeeping,
housekeeping_predict = FALSE,
housekeeping_norm = TRUE,
normalisation_method = "GEO",
remove_outliers = TRUE
)
## [NACHO] Normalising "GSE70970_sum" with new value for parameters:
## - housekeeping_genes = TRUE
## - housekeeping_predict = TRUE
## - remove_outliers = TRUE
## [NACHO] Computing normalisation factors using "GEO" method.
## [NACHO] Missing values have been replaced with zeros for PCA.
## [NACHO] Returning a list.
## $ access : character
## $ housekeeping_genes : character
## $ housekeeping_predict: logical
## $ housekeeping_norm : logical
## $ normalisation_method: character
## $ remove_outliers : logical
## $ n_comp : numeric
## $ data_directory : character
## $ pc_sum : data.frame
## $ nacho : data.frame
## $ outliers_thresholds : list
normalise()
returns a list
object (same as load_rcc()
) with raw_counts
and normalised_counts
slots filled with the raw and normalised counts. Both counts are also in the NACHO data.frame.
autoplot()
functionThe autoplot()
function provides an easy way to plot any quality-control from the visualise()
function.
The possible metrics (x
) are:
"BD"
(Binding Density)"FoV"
(Imaging)"PCL"
(Positive Control Linearity)"LoD"
(Limit of Detection)"Positive"
(Positive Controls)"Negative"
(Negative Controls)"Housekeeping"
(Housekeeping Genes)"PN"
(Positive Controls vs. Negative Controls)"ACBD"
(Average Counts vs. Binding Density)"ACMC"
(Average Counts vs. Median Counts)"PCA12"
(Principal Component 1 vs. 2)"PCAi"
(Principal Component scree plot)"PCA"
(Principal Components planes)"PFNF"
(Positive Factor vs. Negative Factor)"HF"
(Housekeeping Factor)"NORM"
(Normalisation Factor)## `geom_smooth()` using formula 'y ~ x'
## Warning: `expand_scale()` is deprecated; use `expansion()` instead.
## Warning: `expand_scale()` is deprecated; use `expansion()` instead.
## Warning: `expand_scale()` is deprecated; use `expansion()` instead.
## Warning: `expand_scale()` is deprecated; use `expansion()` instead.
## Warning: `expand_scale()` is deprecated; use `expansion()` instead.
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Transformation introduced infinite values in continuous y-axis
## `geom_smooth()` using formula 'y ~ x'