Introduction to classyfireR

Thomas Wilson

Institute of Biological, Environmental & Rural Sciences (IBERS), Aberystwyth University, UK
tpw2@aber.ac.uk

Introduction

ClassyFire is a web-based application for automated structural classification of chemical compounds.

The classyfireR R package provides access to the ClassyFire RESTful API for retrieving existing compound classifications and submitted structures to the web-server for classification.

Installation

classyfireR can be installed from CRAN or, for the latest development version, directly from GitHub using the remotes package.

install.packages('classyfireR')

remotes::install_github('aberHRML/classyfireR')

Retrieving Classifications

To retrieve classifications that are already available simply provide an InChI key to the get_classification function.

library(classyfireR)
#> Loading required package: magrittr

Classification <- get_classification('BRMWTNUJHUMWMS-LURJTMIESA-N')
#> ✔ BRMWTNUJHUMWMS-LURJTMIESA-N

Classification
#> ── ClassyFire Object ───────────────────────────────────── classyfireR v0.3.6 ── 
#> Object Size: 18.2 Kb 
#>  
#> Info: 
#> ● InChIKey=BRMWTNUJHUMWMS-LURJTMIESA-N
#>   
#> ● [H][C@](N)(CC1=CN(C)C=N1)C(O)=O
#>   
#> ● Classification Version: 2.1
#>   
#> kingdom : Organic compounds
#> └─superclass : Organic acids and derivatives
#>   └─class : Carboxylic acids and derivatives
#>     └─subclass : Amino acids, peptides, and analogues
#>       └─level 5 : Amino acids and derivatives
#>         └─level 6 : Alpha amino acids and derivatives
#>           └─level 7 : Histidine and derivatives

The result of each classification is stored in a single S4 (ClassyFire) object. To retrieve multiple classification, simply iterate over a vector of InChI Keys’

InChI_Keys <-
  c('BRMWTNUJHUMWMS-LURJTMIESA-N',
    'MDHYEMXUFSJLGV-UHFFFAOYSA-N',
    'MYYIAHXIVFADCU-QMMMGPOBSA-N')


Classification_List <- purrr::map(InChI_Keys, get_classification)
#> ✔ BRMWTNUJHUMWMS-LURJTMIESA-N
#> ✔ MDHYEMXUFSJLGV-UHFFFAOYSA-N
#> ✔ MYYIAHXIVFADCU-QMMMGPOBSA-N

Submit Multiple Queries

For classification submission using SMILES, this can be performed by supplying multiple SMILES to the submit_query function. The results from all of the inputs, will be returned to a single S4 Query class.

Input <- c(MOL1 = 'CCCOCC', MOL2 = 'CNC(CC1=CN=CN1)C(=O)O')

Query <-
  submit_query(label = 'query_test',
               input = Input,
               type = 'STRUCTURE')



Query
#> ── ClassyFire Query Object ─────────────────────────────── classyfireR v0.3.6 ── 
#> Object Size: 27.2 Kb 
#>  
#> 2 structures classified 
#> ● MOL1 : InChIKey=NVJUHMXYKCUMQA-UHFFFAOYSA-N
#> ● MOL2 : InChIKey=CYZKJBZEIFWZSR-UHFFFAOYSA-N

Accessor Methods

There are a series of accessor methods which will work with either object type to return results from a specific slot in the object.


classification(Classification)
#> # A tibble: 7 x 3
#>   Level      Classification                       CHEMONT          
#>   <chr>      <chr>                                <chr>            
#> 1 kingdom    Organic compounds                    CHEMONTID:0000000
#> 2 superclass Organic acids and derivatives        CHEMONTID:0000264
#> 3 class      Carboxylic acids and derivatives     CHEMONTID:0000265
#> 4 subclass   Amino acids, peptides, and analogues CHEMONTID:0000013
#> 5 level 5    Amino acids and derivatives          CHEMONTID:0000347
#> 6 level 6    Alpha amino acids and derivatives    CHEMONTID:0000060
#> 7 level 7    Histidine and derivatives            CHEMONTID:0004311
classification(Query)
#> # A tibble: 10 x 4
#> # Groups:   inchikey [2]
#>    identifier inchikey                   Level      Classification              
#>    <chr>      <chr>                      <chr>      <chr>                       
#>  1 MOL1       InChIKey=NVJUHMXYKCUMQA-U… kingdom    Organic compounds           
#>  2 MOL1       InChIKey=NVJUHMXYKCUMQA-U… superclass Organic oxygen compounds    
#>  3 MOL1       InChIKey=NVJUHMXYKCUMQA-U… class      Organooxygen compounds      
#>  4 MOL1       InChIKey=NVJUHMXYKCUMQA-U… subclass   Ethers                      
#>  5 MOL1       InChIKey=NVJUHMXYKCUMQA-U… direct_pa… Dialkyl ethers              
#>  6 MOL2       InChIKey=CYZKJBZEIFWZSR-U… kingdom    Organic compounds           
#>  7 MOL2       InChIKey=CYZKJBZEIFWZSR-U… superclass Organic acids and derivativ…
#>  8 MOL2       InChIKey=CYZKJBZEIFWZSR-U… class      Carboxylic acids and deriva…
#>  9 MOL2       InChIKey=CYZKJBZEIFWZSR-U… subclass   Amino acids, peptides, and …
#> 10 MOL2       InChIKey=CYZKJBZEIFWZSR-U… direct_pa… Histidine and derivatives


meta(Classification)
#> $inchikey
#> [1] "InChIKey=BRMWTNUJHUMWMS-LURJTMIESA-N"
#> 
#> $smiles
#> [1] "[H][C@](N)(CC1=CN(C)C=N1)C(O)=O"
#> 
#> $version
#> [1] "2.1"
meta(Query)
#> # A tibble: 2 x 4
#> # Groups:   inchikey [2]
#>   identifier inchikey                     smiles            classification_vers…
#>   <chr>      <chr>                        <chr>             <chr>               
#> 1 MOL1       InChIKey=NVJUHMXYKCUMQA-UHF… CCCOCC            2.1                 
#> 2 MOL2       InChIKey=CYZKJBZEIFWZSR-UHF… CNC(CC1=CN=CN1)C… 2.1


chebi(Classification)
#>  [1] "L-alpha-amino acid (CHEBI:15705)"                  
#>  [2] "imidazolyl carboxylic acid (CHEBI:38307)"          
#>  [3] "aralkylamine (CHEBI:18000)"                        
#>  [4] "imidazoles (CHEBI:24780)"                          
#>  [5] "organic aromatic compound (CHEBI:33659)"           
#>  [6] "amino acid (CHEBI:33709)"                          
#>  [7] "carbonyl compound (CHEBI:36586)"                   
#>  [8] "carboxylic acid (CHEBI:33575)"                     
#>  [9] "carboxylic acid anion (CHEBI:29067)"               
#> [10] "organonitrogen heterocyclic compound (CHEBI:38101)"
#> [11] "pnictogen molecular entity (CHEBI:33302)"          
#> [12] "organic molecular entity (CHEBI:50860)"            
#> [13] "organic oxide (CHEBI:25701)"                       
#> [14] "alkylamine (CHEBI:13759)"                          
#> [15] "organic molecule (CHEBI:72695)"                    
#> [16] "histidine derivative (CHEBI:24599)"                
#> [17] "chemical entity (CHEBI:24431)"                     
#> [18] "organooxygen compound (CHEBI:36963)"               
#> [19] "peptide (CHEBI:16670)"                             
#> [20] "organonitrogen compound (CHEBI:35352)"             
#> [21] "alpha-amino acid (CHEBI:33704)"                    
#> [22] "organic heterocyclic compound (CHEBI:24532)"       
#> [23] "azole (CHEBI:68452)"                               
#> [24] "nitrogen molecular entity (CHEBI:51143)"           
#> [25] "amine (CHEBI:32952)"                               
#> [26] "oxygen molecular entity (CHEBI:25806)"             
#> [27] "primary amine (CHEBI:32877)"
chebi(Query)
#> $MOL1
#> [1] "chemical entity (CHEBI:24431)"         
#> [2] "organic molecular entity (CHEBI:50860)"
#> [3] "ether (CHEBI:25698)"                   
#> [4] "organooxygen compound (CHEBI:36963)"   
#> [5] "organic molecule (CHEBI:72695)"        
#> [6] "oxygen molecular entity (CHEBI:25806)" 
#> 
#> $MOL2
#>  [1] "chemical entity (CHEBI:24431)"                     
#>  [2] "organic molecular entity (CHEBI:50860)"            
#>  [3] "monocarboxylic acid (CHEBI:25384)"                 
#>  [4] "imidazoles (CHEBI:24780)"                          
#>  [5] "carboxylic acid anion (CHEBI:29067)"               
#>  [6] "organonitrogen compound (CHEBI:35352)"             
#>  [7] "secondary amino compound (CHEBI:50995)"            
#>  [8] "imidazolyl carboxylic acid (CHEBI:38307)"          
#>  [9] "amine (CHEBI:32952)"                               
#> [10] "aralkylamine (CHEBI:18000)"                        
#> [11] "secondary amine (CHEBI:32863)"                     
#> [12] "alpha-amino acid (CHEBI:33704)"                    
#> [13] "organonitrogen heterocyclic compound (CHEBI:38101)"
#> [14] "carboxylic acid (CHEBI:33575)"                     
#> [15] "amino acid (CHEBI:33709)"                          
#> [16] "peptide (CHEBI:16670)"                             
#> [17] "organooxygen compound (CHEBI:36963)"               
#> [18] "histidine derivative (CHEBI:24599)"                
#> [19] "organic aromatic compound (CHEBI:33659)"           
#> [20] "organic acid (CHEBI:64709)"                        
#> [21] "organic molecule (CHEBI:72695)"                    
#> [22] "carbonyl compound (CHEBI:36586)"                   
#> [23] "organic heterocyclic compound (CHEBI:24532)"       
#> [24] "azole (CHEBI:68452)"                               
#> [25] "nitrogen molecular entity (CHEBI:51143)"           
#> [26] "oxygen molecular entity (CHEBI:25806)"             
#> [27] "organic oxide (CHEBI:25701)"

Acknowledgements

If you use classyfireR you should cite the ClassyFire publication

Djoumbou Feunang Y, Eisner R, Knox C, Chepelev L, Hastings J, Owen G, Fahy E, Steinbeck C, Subramanian S, Bolton E, Greiner R, and Wishart DS. ClassyFire: Automated Chemical Classification With A Comprehensive, Computable Taxonomy. Journal of Cheminformatics, 2016, 8:61.

DOI: 10.1186/s13321-016-0174-y