Examples of Templet Configuration File

Jianfeng Li

2018-01-24

Templet configuration files in BioInstaller are important. All of softwares and databases source URL, install script and other information were stored in these configuration files.

Most of configuration files be parsed by configr. Compared with original configr package #R# R CMD #R# is a different point which can be used to mark those R format command.

github.toml and nongithub.toml

Built-in configuration files: github.toml and nongithub.toml can be used to download and install several software and database. install.bioinfo(show.all.names = TRUE) can be used to get all of avaliable softwares and databases existed in github.toml and nongithub.toml.

Github Softwares

Some of item in github configuration file can be used to control the BioInstaller behavior:

[bwa]
github_url = "https://github.com/lh3/bwa"
after_failure = "echo 'fail!'"
after_success = "echo 'successful!'"
make_dir = ["./"]
bin_dir = ["./"]

[bwa.before_install]
linux = ""
mac = ""

[bwa.install]
linux = "make"
mac = "make"

Github softwares version control can be done by git2r package and github tag API. Source url of softwares or files deposited in github can be found by github_url in github.toml.

Non-Github Softwares or Databases

Configuration file of non-github softwares and databases are similar to githubs: - github_url be replaced by source_url - url_all_download be setted to true if you want to download mulitple files. - rvest and RCurl packages can be used to parse the version infomation of non-github softwares or databases. - version_order_fixed can be setted to true if you don’t want to use the built-in version reorder function. If you set url_all_download to false, which can let us using multiple mirror to avoid one of invalid URL.

[gmap]
# {{version}} will be parsed to your install.bioinfo `version` parameter
# or the newest version parsed from fetched data.
source_url = "http://research-pub.gene.com/gmap/src/{{version}}.tar.gz"
after_failure = "echo 'fail!'"
after_success = "echo 'successful!'"
make_dir = ["./"]
bin_dir = ["./"]

[gmap.before_install]
linux = ""
mac = ""

[gmap.install]
linux = "./configure --prefix=`pwd` && make && make install"
mac = ["sed -i s/\"## CFLAGS='-O3 -m64' .*\"/\"CFLAGS='-O3 -m64'\"/ config.site",
"./configure --prefix=`pwd` && make && make install"]

Version control of non-github softwares and databases need a function parsing URL and use {{version}} to replace in the source_url.

nongithub_databases_blast.toml

This configuration file can be used to download NCBI blast database: install.bioinfo(nongithub.cfg = system.file('extdata', 'databases/blast.toml', package = 'BioInstaller'), show.all.names = TRUE).

BioInstaller use configr glue to reduce the length of files name. That using less word to storage more files name. More useful databases FTP url can be accessed in the future. I hope you can set your own configuration file not only use the BioInstaller built-in configuration files.

library(configr)
library(BioInstaller)
blast.databases <- system.file('extdata', 
  'config/db/db_blast.toml', package = 'BioInstaller')

read.config(blast.databases)$db_blast_nr$source_url
#> [1] "!!glue ftp://ftp.ncbi.nih.gov/blast/db/nr.{ids=sprintf('%02d', 0:68);rep(ids, 2)}.tar.gz{c(rep('', length(ids)), rep('.md5', length(ids)))}"
read.config(blast.databases, glue.parse = TRUE)$db_blast_nr$source_url
#>   [1] "ftp://ftp.ncbi.nih.gov/blast/db/nr.00.tar.gz"    
#>   [2] "ftp://ftp.ncbi.nih.gov/blast/db/nr.01.tar.gz"    
#>   [3] "ftp://ftp.ncbi.nih.gov/blast/db/nr.02.tar.gz"    
#>   [4] "ftp://ftp.ncbi.nih.gov/blast/db/nr.03.tar.gz"    
#>   [5] "ftp://ftp.ncbi.nih.gov/blast/db/nr.04.tar.gz"    
#>   [6] "ftp://ftp.ncbi.nih.gov/blast/db/nr.05.tar.gz"    
#>   [7] "ftp://ftp.ncbi.nih.gov/blast/db/nr.06.tar.gz"    
#>   [8] "ftp://ftp.ncbi.nih.gov/blast/db/nr.07.tar.gz"    
#>   [9] "ftp://ftp.ncbi.nih.gov/blast/db/nr.08.tar.gz"    
#>  [10] "ftp://ftp.ncbi.nih.gov/blast/db/nr.09.tar.gz"    
#>  [11] "ftp://ftp.ncbi.nih.gov/blast/db/nr.10.tar.gz"    
#>  [12] "ftp://ftp.ncbi.nih.gov/blast/db/nr.11.tar.gz"    
#>  [13] "ftp://ftp.ncbi.nih.gov/blast/db/nr.12.tar.gz"    
#>  [14] "ftp://ftp.ncbi.nih.gov/blast/db/nr.13.tar.gz"    
#>  [15] "ftp://ftp.ncbi.nih.gov/blast/db/nr.14.tar.gz"    
#>  [16] "ftp://ftp.ncbi.nih.gov/blast/db/nr.15.tar.gz"    
#>  [17] "ftp://ftp.ncbi.nih.gov/blast/db/nr.16.tar.gz"    
#>  [18] "ftp://ftp.ncbi.nih.gov/blast/db/nr.17.tar.gz"    
#>  [19] "ftp://ftp.ncbi.nih.gov/blast/db/nr.18.tar.gz"    
#>  [20] "ftp://ftp.ncbi.nih.gov/blast/db/nr.19.tar.gz"    
#>  [21] "ftp://ftp.ncbi.nih.gov/blast/db/nr.20.tar.gz"    
#>  [22] "ftp://ftp.ncbi.nih.gov/blast/db/nr.21.tar.gz"    
#>  [23] "ftp://ftp.ncbi.nih.gov/blast/db/nr.22.tar.gz"    
#>  [24] "ftp://ftp.ncbi.nih.gov/blast/db/nr.23.tar.gz"    
#>  [25] "ftp://ftp.ncbi.nih.gov/blast/db/nr.24.tar.gz"    
#>  [26] "ftp://ftp.ncbi.nih.gov/blast/db/nr.25.tar.gz"    
#>  [27] "ftp://ftp.ncbi.nih.gov/blast/db/nr.26.tar.gz"    
#>  [28] "ftp://ftp.ncbi.nih.gov/blast/db/nr.27.tar.gz"    
#>  [29] "ftp://ftp.ncbi.nih.gov/blast/db/nr.28.tar.gz"    
#>  [30] "ftp://ftp.ncbi.nih.gov/blast/db/nr.29.tar.gz"    
#>  [31] "ftp://ftp.ncbi.nih.gov/blast/db/nr.30.tar.gz"    
#>  [32] "ftp://ftp.ncbi.nih.gov/blast/db/nr.31.tar.gz"    
#>  [33] "ftp://ftp.ncbi.nih.gov/blast/db/nr.32.tar.gz"    
#>  [34] "ftp://ftp.ncbi.nih.gov/blast/db/nr.33.tar.gz"    
#>  [35] "ftp://ftp.ncbi.nih.gov/blast/db/nr.34.tar.gz"    
#>  [36] "ftp://ftp.ncbi.nih.gov/blast/db/nr.35.tar.gz"    
#>  [37] "ftp://ftp.ncbi.nih.gov/blast/db/nr.36.tar.gz"    
#>  [38] "ftp://ftp.ncbi.nih.gov/blast/db/nr.37.tar.gz"    
#>  [39] "ftp://ftp.ncbi.nih.gov/blast/db/nr.38.tar.gz"    
#>  [40] "ftp://ftp.ncbi.nih.gov/blast/db/nr.39.tar.gz"    
#>  [41] "ftp://ftp.ncbi.nih.gov/blast/db/nr.40.tar.gz"    
#>  [42] "ftp://ftp.ncbi.nih.gov/blast/db/nr.41.tar.gz"    
#>  [43] "ftp://ftp.ncbi.nih.gov/blast/db/nr.42.tar.gz"    
#>  [44] "ftp://ftp.ncbi.nih.gov/blast/db/nr.43.tar.gz"    
#>  [45] "ftp://ftp.ncbi.nih.gov/blast/db/nr.44.tar.gz"    
#>  [46] "ftp://ftp.ncbi.nih.gov/blast/db/nr.45.tar.gz"    
#>  [47] "ftp://ftp.ncbi.nih.gov/blast/db/nr.46.tar.gz"    
#>  [48] "ftp://ftp.ncbi.nih.gov/blast/db/nr.47.tar.gz"    
#>  [49] "ftp://ftp.ncbi.nih.gov/blast/db/nr.48.tar.gz"    
#>  [50] "ftp://ftp.ncbi.nih.gov/blast/db/nr.49.tar.gz"    
#>  [51] "ftp://ftp.ncbi.nih.gov/blast/db/nr.50.tar.gz"    
#>  [52] "ftp://ftp.ncbi.nih.gov/blast/db/nr.51.tar.gz"    
#>  [53] "ftp://ftp.ncbi.nih.gov/blast/db/nr.52.tar.gz"    
#>  [54] "ftp://ftp.ncbi.nih.gov/blast/db/nr.53.tar.gz"    
#>  [55] "ftp://ftp.ncbi.nih.gov/blast/db/nr.54.tar.gz"    
#>  [56] "ftp://ftp.ncbi.nih.gov/blast/db/nr.55.tar.gz"    
#>  [57] "ftp://ftp.ncbi.nih.gov/blast/db/nr.56.tar.gz"    
#>  [58] "ftp://ftp.ncbi.nih.gov/blast/db/nr.57.tar.gz"    
#>  [59] "ftp://ftp.ncbi.nih.gov/blast/db/nr.58.tar.gz"    
#>  [60] "ftp://ftp.ncbi.nih.gov/blast/db/nr.59.tar.gz"    
#>  [61] "ftp://ftp.ncbi.nih.gov/blast/db/nr.60.tar.gz"    
#>  [62] "ftp://ftp.ncbi.nih.gov/blast/db/nr.61.tar.gz"    
#>  [63] "ftp://ftp.ncbi.nih.gov/blast/db/nr.62.tar.gz"    
#>  [64] "ftp://ftp.ncbi.nih.gov/blast/db/nr.63.tar.gz"    
#>  [65] "ftp://ftp.ncbi.nih.gov/blast/db/nr.64.tar.gz"    
#>  [66] "ftp://ftp.ncbi.nih.gov/blast/db/nr.65.tar.gz"    
#>  [67] "ftp://ftp.ncbi.nih.gov/blast/db/nr.66.tar.gz"    
#>  [68] "ftp://ftp.ncbi.nih.gov/blast/db/nr.67.tar.gz"    
#>  [69] "ftp://ftp.ncbi.nih.gov/blast/db/nr.68.tar.gz"    
#>  [70] "ftp://ftp.ncbi.nih.gov/blast/db/nr.00.tar.gz.md5"
#>  [71] "ftp://ftp.ncbi.nih.gov/blast/db/nr.01.tar.gz.md5"
#>  [72] "ftp://ftp.ncbi.nih.gov/blast/db/nr.02.tar.gz.md5"
#>  [73] "ftp://ftp.ncbi.nih.gov/blast/db/nr.03.tar.gz.md5"
#>  [74] "ftp://ftp.ncbi.nih.gov/blast/db/nr.04.tar.gz.md5"
#>  [75] "ftp://ftp.ncbi.nih.gov/blast/db/nr.05.tar.gz.md5"
#>  [76] "ftp://ftp.ncbi.nih.gov/blast/db/nr.06.tar.gz.md5"
#>  [77] "ftp://ftp.ncbi.nih.gov/blast/db/nr.07.tar.gz.md5"
#>  [78] "ftp://ftp.ncbi.nih.gov/blast/db/nr.08.tar.gz.md5"
#>  [79] "ftp://ftp.ncbi.nih.gov/blast/db/nr.09.tar.gz.md5"
#>  [80] "ftp://ftp.ncbi.nih.gov/blast/db/nr.10.tar.gz.md5"
#>  [81] "ftp://ftp.ncbi.nih.gov/blast/db/nr.11.tar.gz.md5"
#>  [82] "ftp://ftp.ncbi.nih.gov/blast/db/nr.12.tar.gz.md5"
#>  [83] "ftp://ftp.ncbi.nih.gov/blast/db/nr.13.tar.gz.md5"
#>  [84] "ftp://ftp.ncbi.nih.gov/blast/db/nr.14.tar.gz.md5"
#>  [85] "ftp://ftp.ncbi.nih.gov/blast/db/nr.15.tar.gz.md5"
#>  [86] "ftp://ftp.ncbi.nih.gov/blast/db/nr.16.tar.gz.md5"
#>  [87] "ftp://ftp.ncbi.nih.gov/blast/db/nr.17.tar.gz.md5"
#>  [88] "ftp://ftp.ncbi.nih.gov/blast/db/nr.18.tar.gz.md5"
#>  [89] "ftp://ftp.ncbi.nih.gov/blast/db/nr.19.tar.gz.md5"
#>  [90] "ftp://ftp.ncbi.nih.gov/blast/db/nr.20.tar.gz.md5"
#>  [91] "ftp://ftp.ncbi.nih.gov/blast/db/nr.21.tar.gz.md5"
#>  [92] "ftp://ftp.ncbi.nih.gov/blast/db/nr.22.tar.gz.md5"
#>  [93] "ftp://ftp.ncbi.nih.gov/blast/db/nr.23.tar.gz.md5"
#>  [94] "ftp://ftp.ncbi.nih.gov/blast/db/nr.24.tar.gz.md5"
#>  [95] "ftp://ftp.ncbi.nih.gov/blast/db/nr.25.tar.gz.md5"
#>  [96] "ftp://ftp.ncbi.nih.gov/blast/db/nr.26.tar.gz.md5"
#>  [97] "ftp://ftp.ncbi.nih.gov/blast/db/nr.27.tar.gz.md5"
#>  [98] "ftp://ftp.ncbi.nih.gov/blast/db/nr.28.tar.gz.md5"
#>  [99] "ftp://ftp.ncbi.nih.gov/blast/db/nr.29.tar.gz.md5"
#> [100] "ftp://ftp.ncbi.nih.gov/blast/db/nr.30.tar.gz.md5"
#> [101] "ftp://ftp.ncbi.nih.gov/blast/db/nr.31.tar.gz.md5"
#> [102] "ftp://ftp.ncbi.nih.gov/blast/db/nr.32.tar.gz.md5"
#> [103] "ftp://ftp.ncbi.nih.gov/blast/db/nr.33.tar.gz.md5"
#> [104] "ftp://ftp.ncbi.nih.gov/blast/db/nr.34.tar.gz.md5"
#> [105] "ftp://ftp.ncbi.nih.gov/blast/db/nr.35.tar.gz.md5"
#> [106] "ftp://ftp.ncbi.nih.gov/blast/db/nr.36.tar.gz.md5"
#> [107] "ftp://ftp.ncbi.nih.gov/blast/db/nr.37.tar.gz.md5"
#> [108] "ftp://ftp.ncbi.nih.gov/blast/db/nr.38.tar.gz.md5"
#> [109] "ftp://ftp.ncbi.nih.gov/blast/db/nr.39.tar.gz.md5"
#> [110] "ftp://ftp.ncbi.nih.gov/blast/db/nr.40.tar.gz.md5"
#> [111] "ftp://ftp.ncbi.nih.gov/blast/db/nr.41.tar.gz.md5"
#> [112] "ftp://ftp.ncbi.nih.gov/blast/db/nr.42.tar.gz.md5"
#> [113] "ftp://ftp.ncbi.nih.gov/blast/db/nr.43.tar.gz.md5"
#> [114] "ftp://ftp.ncbi.nih.gov/blast/db/nr.44.tar.gz.md5"
#> [115] "ftp://ftp.ncbi.nih.gov/blast/db/nr.45.tar.gz.md5"
#> [116] "ftp://ftp.ncbi.nih.gov/blast/db/nr.46.tar.gz.md5"
#> [117] "ftp://ftp.ncbi.nih.gov/blast/db/nr.47.tar.gz.md5"
#> [118] "ftp://ftp.ncbi.nih.gov/blast/db/nr.48.tar.gz.md5"
#> [119] "ftp://ftp.ncbi.nih.gov/blast/db/nr.49.tar.gz.md5"
#> [120] "ftp://ftp.ncbi.nih.gov/blast/db/nr.50.tar.gz.md5"
#> [121] "ftp://ftp.ncbi.nih.gov/blast/db/nr.51.tar.gz.md5"
#> [122] "ftp://ftp.ncbi.nih.gov/blast/db/nr.52.tar.gz.md5"
#> [123] "ftp://ftp.ncbi.nih.gov/blast/db/nr.53.tar.gz.md5"
#> [124] "ftp://ftp.ncbi.nih.gov/blast/db/nr.54.tar.gz.md5"
#> [125] "ftp://ftp.ncbi.nih.gov/blast/db/nr.55.tar.gz.md5"
#> [126] "ftp://ftp.ncbi.nih.gov/blast/db/nr.56.tar.gz.md5"
#> [127] "ftp://ftp.ncbi.nih.gov/blast/db/nr.57.tar.gz.md5"
#> [128] "ftp://ftp.ncbi.nih.gov/blast/db/nr.58.tar.gz.md5"
#> [129] "ftp://ftp.ncbi.nih.gov/blast/db/nr.59.tar.gz.md5"
#> [130] "ftp://ftp.ncbi.nih.gov/blast/db/nr.60.tar.gz.md5"
#> [131] "ftp://ftp.ncbi.nih.gov/blast/db/nr.61.tar.gz.md5"
#> [132] "ftp://ftp.ncbi.nih.gov/blast/db/nr.62.tar.gz.md5"
#> [133] "ftp://ftp.ncbi.nih.gov/blast/db/nr.63.tar.gz.md5"
#> [134] "ftp://ftp.ncbi.nih.gov/blast/db/nr.64.tar.gz.md5"
#> [135] "ftp://ftp.ncbi.nih.gov/blast/db/nr.65.tar.gz.md5"
#> [136] "ftp://ftp.ncbi.nih.gov/blast/db/nr.66.tar.gz.md5"
#> [137] "ftp://ftp.ncbi.nih.gov/blast/db/nr.67.tar.gz.md5"
#> [138] "ftp://ftp.ncbi.nih.gov/blast/db/nr.68.tar.gz.md5"
mask.github <- tempfile()
file.create(mask.github)
#> [1] TRUE
install.bioinfo(nongithub.cfg = blast.databases, github.cfg = mask.github,
  show.all.names = TRUE)
#> Warning in fetch.config(github.cfg): Configuration file /tmp/Rtmpf6U2g1/
#> filed60859611db4 is empty, please check the links.
#>  [1] "db_blast_env_nr"                  "db_blast_est_human"              
#>  [3] "db_blast_est_mouse"               "db_blast_est_others"             
#>  [5] "db_blast_gss"                     "db_blast_htgs"                   
#>  [7] "db_blast_human_genomic"           "db_blast_landmark"               
#>  [9] "db_blast_mouse_genomic"           "db_blast_nr"                     
#> [11] "db_blast_nt"                      "db_blast_other_genomic"          
#> [13] "db_blast_pataa"                   "db_blast_patnt"                  
#> [15] "db_blast_pdbaa"                   "db_blast_pdbnt"                  
#> [17] "db_blast_ref_prok_rep_genomes"    "db_blast_ref_viroids_rep_genomes"
#> [19] "db_blast_ref_viruses_rep_genomes" "db_blast_refseq_genomic"         
#> [21] "db_blast_refseq_protein"          "db_blast_refseq_rna"             
#> [23] "db_blast_refseqgene"              "db_blast_sts"                    
#> [25] "db_blast_swissprot"               "db_blast_taxdb"                  
#> [27] "db_blast_tsa_nr"                  "db_blast_tsa_nt"                 
#> [29] "db_blast_vector"

Automatic parse from BIO_SOFTWARES_DB_ACTIVE database

To resolve the dependence and its path, BioInstall can automatic recogniztion the {{key:value}} format expression and get its real value from BioInstall BIO_SOFWARES_DB_ACTIVE database which were be setted by environment variable BIO_SOFTWARES_DB_ACTIVE and parameter db.

For example, Pindel need the htslib to finish its install, and we use ./INSTALL {{htslib:source.dir}} as the install step of Pindel that the value be setted in system.file("extdata", "github.toml", package = "BioInstaller") file Pindel section. In R, the {{htslib:source.dir}} will be replaced by the value that were stored in BIO_SOFTWARES_DB_ACTIVE or db, a parameter of install.bioinfo function, which indicating the BioInstall Database PATH. More example can be founded in configr parse.extra ‘other.config’ parameter.

Automatic parse from install.bioinfo parameter extra.list

To improve the flexibility of configuration templet, BioInstall can automatic recogniztion the {{parameters}} format expression and get its value if its existed in install.bioinfo extra.list parameter and name, version, os.version, destdir were default be setted.

For example, GMAP source_url need to point the version value, and we use source_url = "http://research-pub.gene.com/gmap/src/{{version}}.tar.gz" as the download URL and can be used to install different version GMAP that the value be setted in system.file("extdata", "nongithub.toml", package = "BioInstaller") file gmap section. In R, the {{version}} will be replaced by the version parameter value of install.bioinfo (if version were NULL, it will be setted to be the newst version). More example can be founded in configr parse.extra ‘extra.list’ parameter.

Automatic parse from R CMD

For example, @>@ str_replace('{{version}}', '-linux64', '') @<@. If this expression {{version}} be parsed to 1.2.0-linux64 in configuration file, the full expression will be parsed to 1.2.0. So, you can write yourself install scripts or config file conveniently. More example can be founded in configr ‘rcmd.parse’ parameter.