GREP2 : GEO RNA-seq Experiments Processing Pipeline

The Gene Expression Omnibus (GEO) is a public repository of gene expression data that hosts more than 6,000 RNA-seq datasets and this number is increasing. Most of these datasets are deposited in raw sequencing format which needs to be downloaded and processed. With an aim to transform all these datasets in an analysis-ready format, we have developed a comprehensive pipeline to simultaneously download and process RNA-seq data sets from GEO. This R-based automated pipeline can process the available RNA-seq data of human, mouse, and rat from GEO. This package is recommended to use in the unix environment as many of the features are not available in windows.

You can run the above individual functions for each step or run the whole pipeline using the following process_geo_rnaseq function. All of the above steps are combined into the following single function. We would recommend using this function for processing GEO RNA-seq data.

process_geo_rnaseq (geo_series_acc=geo_series_acc,destdir=tempdir(),
download_method="auto",
ascp=FALSE,prefetch_workspace=NULL,
ascp_path=NULL,use_sra_file=FALSE,trim_fastq=FALSE,
index_dir=tempdir(),species="human",
countsFromAbundance="lengthScaledTPM",n_thread=1)