Taxonomy of Gammaproteobacteria

Preliminaries

This vignette illustrates the most useful functions of yatah.

library(ggtree) # from Bioconductor
library(dplyr)
library(yatah)

Data

For this example, we use data from Zeller et al. (2014). It is the abundances of bacteria present in 199 stool samples.

abundances <- as_tibble(yatah::abundances)
print(abundances, n_extra = 2)
#> # A tibble: 1,585 x 200
#>    lineages `CCIS00146684ST… `CCIS00281083ST… `CCIS02124300ST…
#>    <chr>               <dbl>            <dbl>            <dbl>
#>  1 k__Bact…        100.0             99.8               96.3  
#>  2 k__Viru…          0.00697          0.128              3.70 
#>  3 k__Bact…         66.2             24.6               74.2  
#>  4 k__Bact…         19.1             74.4               11.9  
#>  5 k__Bact…         12.1              0.0428             7.22 
#>  6 k__Bact…          1.86             0.428              0.765
#>  7 k__Bact…          0.758            0.388              2.28 
#>  8 k__Viru…          0.00697          0.128              3.70 
#>  9 k__Bact…          0.00155          0.00415            0    
#> 10 k__Bact…         62.4             21.7               62.3  
#> # … with 1,575 more rows, and 196 more variables:
#> #   `CCIS02379307ST-4-0` <dbl>, `CCIS02856720ST-4-0` <dbl>, …
taxonomy <- select(abundances, lineages)
taxonomy
#> # A tibble: 1,585 x 1
#>    lineages                                  
#>    <chr>                                     
#>  1 k__Bacteria                               
#>  2 k__Viruses                                
#>  3 k__Bacteria|p__Firmicutes                 
#>  4 k__Bacteria|p__Bacteroidetes              
#>  5 k__Bacteria|p__Actinobacteria             
#>  6 k__Bacteria|p__Verrucomicrobia            
#>  7 k__Bacteria|p__Proteobacteria             
#>  8 k__Viruses|p__Viruses_noname              
#>  9 k__Bacteria|p__Candidatus_Saccharibacteria
#> 10 k__Bacteria|p__Firmicutes|c__Clostridia   
#> # … with 1,575 more rows

Filtering

Here, we have all the present bacteria at all different ranks. As we are just interested in genera that belong to the Gammaproteobacteria class, we filter() the lineages with is_clade() and is_rank(). The genus name is accessible with last_clade().

gammap_genus <-
  taxonomy %>% 
  filter(is_clade(lineages, "Gammaproteobacteria"),
         is_rank(lineages, "genus")) %>% 
  mutate(genus = last_clade(lineages))
gammap_genus
#> # A tibble: 26 x 2
#>    lineages                                            genus               
#>    <chr>                                               <chr>               
#>  1 k__Bacteria|p__Proteobacteria|c__Gammaproteobacter… Escherichia         
#>  2 k__Bacteria|p__Proteobacteria|c__Gammaproteobacter… Haemophilus         
#>  3 k__Bacteria|p__Proteobacteria|c__Gammaproteobacter… Enterobacteriaceae_…
#>  4 k__Bacteria|p__Proteobacteria|c__Gammaproteobacter… Pseudomonas         
#>  5 k__Bacteria|p__Proteobacteria|c__Gammaproteobacter… Enterobacter        
#>  6 k__Bacteria|p__Proteobacteria|c__Gammaproteobacter… Aggregatibacter     
#>  7 k__Bacteria|p__Proteobacteria|c__Gammaproteobacter… Hafnia              
#>  8 k__Bacteria|p__Proteobacteria|c__Gammaproteobacter… Actinobacillus      
#>  9 k__Bacteria|p__Proteobacteria|c__Gammaproteobacter… Sinobacteraceae_unc…
#> 10 k__Bacteria|p__Proteobacteria|c__Gammaproteobacter… Citrobacter         
#> # … with 16 more rows

Taxonomic table

It is useful to have a taxonomic table. taxtable() do the job.

gammaprot_table <-
  gammap_genus %>% 
  pull(lineages) %>% 
  taxtable()
as_tibble(gammaprot_table)
#> # A tibble: 26 x 6
#>    kingdom  phylum     class        order       family      genus          
#>    <chr>    <chr>      <chr>        <chr>       <chr>       <chr>          
#>  1 Bacteria Proteobac… Gammaproteo… Enterobact… Enterobact… Escherichia    
#>  2 Bacteria Proteobac… Gammaproteo… Pasteurell… Pasteurell… Haemophilus    
#>  3 Bacteria Proteobac… Gammaproteo… Enterobact… Enterobact… Enterobacteria…
#>  4 Bacteria Proteobac… Gammaproteo… Pseudomona… Pseudomona… Pseudomonas    
#>  5 Bacteria Proteobac… Gammaproteo… Enterobact… Enterobact… Enterobacter   
#>  6 Bacteria Proteobac… Gammaproteo… Pasteurell… Pasteurell… Aggregatibacter
#>  7 Bacteria Proteobac… Gammaproteo… Enterobact… Enterobact… Hafnia         
#>  8 Bacteria Proteobac… Gammaproteo… Pasteurell… Pasteurell… Actinobacillus 
#>  9 Bacteria Proteobac… Gammaproteo… Xanthomona… Sinobacter… Sinobacteracea…
#> 10 Bacteria Proteobac… Gammaproteo… Enterobact… Enterobact… Citrobacter    
#> # … with 16 more rows

Taxonomic tree

To have a tree, use taxtree() with a taxonomic table in input. By default, it collapses ranks with only one subrank.

gammaprot_tree <- taxtree(gammaprot_table)
gammaprot_tree
#> 
#> Phylogenetic tree with 26 tips and 7 internal nodes.
#> 
#> Tip labels:
#>  Escherichia, Enterobacteriaceae_noname, Enterobacter, Hafnia, Citrobacter, Pantoea, ...
#> Node labels:
#>  Gammaproteobacteria, Enterobacteriaceae, Pasteurellaceae, Pseudomonadales, Moraxellaceae, Xanthomonadales, ...
#> 
#> Rooted; includes branch lengths.

Instead of a classical plot, we use ggtree (Yu et al. (2017)) to display the tree.

ggtree(gammaprot_tree) +
  geom_tiplab(hjust = 1, geom = "label") +
  geom_nodelab(hjust = 0, size = 3)

References

Yu, Guangchuang, David Smith, Huachen Zhu, Yi Guan, and Tommy Tsan-Yuk Lam. 2017. “Ggtree: An R Package for Visualization and Annotation of Phylogenetic Trees with Their Covariates and Other Associated Data.” Methods in Ecology and Evolution 8 (1): 28–36. https://doi.org/10.1111/2041-210X.12628.

Zeller, Georg, Julien Tap, Anita Y Voigt, Shinichi Sunagawa, Jens Roat Kultima, Paul I Costea, Aurélien Amiot, et al. 2014. “Potential of Fecal Microbiota for Early-Stage Detection of Colorectal Cancer.” Molecular Systems Biology 10 (11). EMBO Press: 766.