The myTAI
package provides a statistical framework and a collection of functions that can be used to perform phylotranscriptomics analyses and visualization to investigate phenomena within the field of Evolutionary Developmental Biology.
The goal of this package is to provide a broad range of users (as well as the phylotranscriptomics community) with an easy to use and computationally optimized framework for reproducible research in the fields referred to as Evolutionary Developmental Biology and Phylotranscriptomics and to allow a better comparability between molecular phenomena resulting from in silico data analyses.
A series of tutorials aim to introduce you to the myTAI
package:
Internally, computationally intensive functions have been written in C++ using the Rcpp package to allow fast analytics in phylotranscriptomics research.
All functions implemented in the myTAI
package expect a specific data format as input data. These standardized data formats are referred to as PhyloExpressionSet and DivergenceExpressionSet.
The simplest way to get introduced into the standard formats referred to as PhyloExpressionSet and DivergenceExpressionSet is to look at the example data sets that come with the myTAI
package.
library(myTAI)
# load an example PhyloExpressionSet stored in the myTAI package
data(PhyloExpressionSetExample)
# load an example DivergenceExpressionSet stored in the myTAI package
data(DivergenceExpressionSetExample)
Now, we want to have a look at the data sets and their standard format.
head(PhyloExpressionSetExample, 3) # head of an example standard PhyloExpressionSet
Phylostratum GeneID Zygote Quadrant Globular Heart Torpedo Bent Mature
1 1 at1g01040.2 2173.635 1911.200 1152.555 1291.4224 1000.253 962.9772 1696.4274
2 1 at1g01050.1 1501.014 1817.309 1665.309 1564.7612 1496.321 1114.6435 1071.6555
3 1 at1g01070.1 1212.793 1233.002 939.200 929.6195 864.218 877.2060 894.8189
As you can see, a standard PhyloExpressionSet is a data.frame
storing the Phylostratum assignment of a given gene in the first column, the gene id of the corresponding gene in the second column, and the entire gene expression set (time series or treatments) starting with the third column. This format is crucial for all functions that are implemented in the myTAI
package.
Each function checks, whether the PhyloExpressionSet standard is fulfilled.
# used by all myTAI functions to check the validity of the PhyloExpressionSet standard
is.ExpressionSet(PhyloExpressionSetExample)
[1] TRUE
In case the PhyloExpressionSet standard is violated, the is.ExpressionSet()
function will return FALSE
and the corresponding function within the myTAI
package will return an error message.
# used a non standard PhyloExpressionSet
head(PhyloExpressionSetExample[ , 2:5], 2)
GeneID Zygote Quadrant Globular
1 at1g01040.2 2173.635 1911.200 1152.555
2 at1g01050.1 1501.014 1817.309 1665.309
is.ExpressionSet(PhyloExpressionSetExample[ , 2:5])
Error in is.ExpressionSet(PhyloExpressionSetExample[, 2:5]) :
The present input object does not fulfill the ExpressionSet standard.
This PhyloExpressionSetExample
data set stored within the myTAI
package stores the time course (seven stages) of Arabidopsis thaliana embryo development. Again, the first column represents the Phylostratum assignment of a given gene, the second column stores the gene id of the corresponding gene, and columns 3 to 9 store the gene expression set of A. thaliana embryo development.
head(DivergenceExpressionSetExample, 3) # head of an example standard DivergenceExpressionSet
Divergence.stratum GeneID Zygote Quadrant Globular Heart Torpedo Bent Mature
1 1 at1g01050.1 1501.0141 1817.3086 1665.3089 1564.761 1496.3207 1114.6435 1071.6555
2 1 at1g01120.1 844.0414 787.5929 859.6267 931.618 942.8453 870.2625 792.7542
3 1 at1g01140.3 1041.4291 908.3929 1068.8832 967.749 1055.1901 1109.4662 825.4633
Analogous to a standard PhyloExpressionSet, a standard DivergenceExpressionSet is a data.frame
storing the Divergence-Stratum assignment of a given gene in the first column, the gene id of the corresponding gene in the second column, and the entire gene expression set (time series or treatments) starting with the third column. This format is crucial for all functions that are implemented in the myTAI
package.
Keeping these standard data formats in mind will provide you with the most important requirements to get started with the myTAI
package.
Note, that within the code of each function, the argument ExpressionSet
always refers to either a PhyloExpressionSet or a DivergenceExpressionSet, whereas in specialized functions some arguments are specified as PhyloExpressionSet when they take an PhyloExpressionSet as input data set, or specified as DivergenceExpressionSet when they take an DivergenceExpressionSet as input data set.
The myTAI
package provides the statistical framework for existing PhyloExpressionSets or DivergenceExpressionSets. To re-analyze, reproduce, or further investigate phenomena of existing PhyloExpressionSets or DivergenceExpressionSets, you can use the following procedure to download published datasets. In this example we will download Supplementary Table S3
from Drost et al., 2015.
# install.packages("gdata")
library(gdata)
url_phyloSet <- "http://files.figshare.com/1798296/Supplementary_table_S3.xls"
download.file( url = url_phyloSet,
destfile = "PhyloExpressionSetTables.xls" )
Athaliana_PhyloExpressionSet <- read.xls("PhyloExpressionSetTables.xls", sheet = 3)
head(Athaliana_PhyloExpressionSet)
Phylostratum GeneID Zygote Quadrant Globular Heart Torpedo Bent Mature
1 1 at1g01040.2 2173.6352 1911.2001 1152.5553 1291.4224 1000.2529 962.9772 1696.4274
2 1 at1g01050.1 1501.0141 1817.3086 1665.3089 1564.7612 1496.3207 1114.6435 1071.6555
3 1 at1g01070.1 1212.7927 1233.0023 939.2000 929.6195 864.2180 877.2060 894.8189
4 1 at1g01080.2 1016.9203 936.3837 1181.3381 1329.4734 1392.6429 1287.9746 861.2605
5 1 at1g01090.1 11424.5667 16778.1685 34366.6493 39775.6405 56231.5689 66980.3673 7772.5617
6 1 at1g01120.1 844.0414 787.5929 859.6267 931.6180 942.8453 870.2625 792.7542
This way you can download all published PhyloExpressionSets or DivergenceExpressionSets to perform phylotranscriptomics analyses.
# Compute the Transcriptome Age Index of the downloaded PhyloExpressionSet
TAI(Athaliana_PhyloExpressionSet)
Zygote Quadrant Globular Heart Torpedo Bent Mature
3.229942 3.225614 3.107135 3.116693 3.073993 3.176511 3.390334
Phylotranscriptomics combines evolutionary information obtained from Phylostratigraphy and Divergence-Stratigraphy with corresponding transcriptome data.
For this purpose the following datasets need to be generated or obtained:
Phylostratigraphic Map of an Organism (of Interest)
Divergence Map of an Organism (of Interest)
Transcriptome Data Covering a Specific Experiment for the Organism (of Interest)
Each step can be performed independent of all other steps and the resulting datasets can also be used for different (independent) studies and analyses.
A Phylostratigraphic Map can be generated via Phylostratigraphy. You can find a Perl Script
for performing Phylostratigraphy in Drost et al., 2015. After performing Phylostratigraphy the resulting Phylostratigraphic Map stores the Phylostratum assignment of genes in the first column and the corresponding gene id in the second column.
head(Athaliana_PhyloExpressionSet[ , c("Phylostratum","GeneID")])
Phylostratum GeneID
1 1 at1g01040.2
2 1 at1g01050.1
3 1 at1g01070.1
4 1 at1g01080.2
5 1 at1g01090.1
6 1 at1g01120.1
A Divergence Map can be generated via Divergence-Stratigraphy. A detailed description on how to perform Divergence-Stratigraphy using the orthologr package can be found in the Advanced Vignette
. After performing Divergence-Stratigraphy the resulting Divergence Map stores the Divergence-Stratum assignment of genes in the first column and the corresponding gene id in the second column.
head(DivergenceExpressionSetExample[ , c("Divergence.stratum","GeneID")])
Divergence.stratum GeneID
1 1 at1g01050.1
2 1 at1g01120.1
3 1 at1g01140.3
4 1 at1g01170.1
5 1 at1g01230.1
6 1 at1g01540.2
Any transcriptome data of interest can be used to perform phylotranscriptomics analyses. In the following sections as well as in the documentation of myTAI
functions transcriptome data is referred to as ExpressionMatrix
.
It is assumed that a ExpressionMatrix
used to perform phylotranscriptomics analyses does not include duplicate expression values for a gene and has the following format:
head(DivergenceExpressionSetExample[ , 2:9])
GeneID Zygote Quadrant Globular Heart Torpedo Bent Mature
1 at1g01050.1 1501.0141 1817.3086 1665.3089 1564.7612 1496.3207 1114.6435 1071.6555
2 at1g01120.1 844.0414 787.5929 859.6267 931.6180 942.8453 870.2625 792.7542
3 at1g01140.3 1041.4291 908.3929 1068.8832 967.7490 1055.1901 1109.4662 825.4633
4 at1g01170.1 1361.6646 1042.1991 1225.5625 1211.7386 1674.5224 2136.4284 10662.4763
5 at1g01230.1 894.1276 946.6993 933.0931 965.1859 870.9218 843.1814 794.6536
6 at1g01540.2 1464.3065 1451.4255 2378.7054 1993.9326 1800.2420 2119.9220 1020.2640
So the Gene IDs are stored in the first column and the corresponding gene expression matrix is stored in the following columns.
Now you can use the ?MatchMap
function to combine a Phylostratigraphic Map or Divergence Map with a transcriptome dataset of interest.
# load a standard PhyloExpressionSet
data(PhyloExpressionSetExample)
# in a standard PhyloExpressionSet,
# column one and column two denote a standard
# Phylostratigraphic Map
PhyloMap <- PhyloExpressionSetExample[ , 1:2]
# look at the Phylostratigraphic Map standard
head(PhyloMap)
Phylostratum GeneID
1 1 at1g01040.2
2 1 at1g01050.1
3 1 at1g01070.1
4 1 at1g01080.2
5 1 at1g01090.1
6 1 at1g01120.1
# in a standard PhyloExpressionSet, column two combined
# with column 3 - N denote a standard ExpressionMatrix
ExpressionMatrixExample <- PhyloExpressionSetExample[ , c(2,3:9)]
head(ExpressionMatrixExample)
GeneID Zygote Quadrant Globular Heart Torpedo Bent Mature
1 at1g01040.2 2173.6352 1911.2001 1152.5553 1291.4224 1000.2529 962.9772 1696.4274
2 at1g01050.1 1501.0141 1817.3086 1665.3089 1564.7612 1496.3207 1114.6435 1071.6555
3 at1g01070.1 1212.7927 1233.0023 939.2000 929.6195 864.2180 877.2060 894.8189
4 at1g01080.2 1016.9203 936.3837 1181.3381 1329.4734 1392.6429 1287.9746 861.2605
5 at1g01090.1 11424.5667 16778.1685 34366.6493 39775.6405 56231.5689 66980.3673 7772.5617
6 at1g01120.1 844.0414 787.5929 859.6267 931.6180 942.8453 870.2625 792.7542
# now we can use the MatchMap function to merge both data sets
# to obtain a standard PhyloExpressionSet
PES <- MatchMap(PhyloMap, ExpressionMatrixExample)
head(PES)
Phylostratum GeneID Zygote Quadrant Globular Heart Torpedo Bent Mature
1 4 at1g01010.1 878.2158 828.2301 776.0703 753.9589 775.3377 756.2460 999.9118
2 2 at1g01020.1 1004.9710 1106.2621 1037.5141 939.0830 961.5249 871.4684 997.5953
3 3 at1g01030.1 819.4880 771.6396 810.8717 866.7780 773.7893 747.9941 785.6105
4 1 at1g01040.2 2173.6352 1911.2001 1152.5553 1291.4224 1000.2529 962.9772 1696.4274
5 1 at1g01050.1 1501.0141 1817.3086 1665.3089 1564.7612 1496.3207 1114.6435 1071.6555
6 2 at1g01060.1 787.0917 802.3522 738.8684 754.8027 716.0982 670.1015 809.2199
Using the MatchMap()
function you can combine any data set storing evolutionary information (e.g. generated by performing Phylostratigraphy or Divergence-Stratigraphy) with transcriptome data representing a biological process of interest to detect evolutionary signals and conserved stages within this biological process.
Before starting any computation, sometimes it is good to visualize the Phylostratum distribution of genes stored within a given PhyloExpressionSet.
For this purpose, the PlotDistribution()
function was implemented:
# Display the phylostratum distribution (gene frequency distribution)
# of a PhyloExpressionSet as absolute frequency distribution
PlotDistribution( PhyloExpressionSet = PhyloExpressionSetExample,
xlab = "Phylostratum" )
or display it as relative frequencies:
# Plot phylostrata as relative frequency distribution
PlotDistribution( PhyloExpressionSet = PhyloExpressionSetExample,
as.ratio = TRUE,
xlab = "Phylostratum",
cex = 0.7 )
Another important feature to check is whether the Phylostratum assignment and Divergence-Stratum assignment of the genes stored within the PhyloExpressionSet and DivergenceExpressionSet are correlated (linear dependent). This is important to be able to assume the linear independence of TAI and TDI measures.
For this purpose the PlotCorrelation()
function was implemented:
# Visualizing the correlation between Phylostratum and Divergence-Stratum assignments
# of the intersecting set of genes that are stored within the PhyloExpressionSet
# and DivergenceExpressionSet
PlotCorrelation( PhyloExpressionSet = PhyloExpressionSetExample,
DivergenceExpressionSet = DivergenceExpressionSetExample,
method = "kendall",
linearModel = TRUE )
In this case Phylostratum and Divergence-Stratum assignments of the intersecting set of genes that are stored within the PhyloExpressionSet and DivergenceExpressionSet are weakly correlated, but can be assumed to be linear independent.
Note: The PlotCorrelation()
function always takes a PhyloExpressionSet as first argument and a DivergenceExpressionSet as second argument.
Mathematically the Transcriptome Age Index (TAI) introduced by Domazet-Loso and Tautz (2010) represents a weighted arithmetic mean of the transcriptome age during a corresponding developmental stage s.
Analogous to the TAI measure, the Transcriptome Divergence Index (TDI) was introduced by Quint et al. (2012) as weighted arithmetic mean of the degree of sequence divergence of the transcriptome being expressed during a corresponding developmental stage s.
The PlotPattern()
function first computes the TAI
(given a PhyloExpressionSet) or the TDI
(given a DivergenceExpressionSet) to visualize the TAI or TDI profile, their standard deviation and statistical significance.
# Plot the Transcriptome Age Index of a given PhyloExpressionSet
# Test Statistic : Flat Line Test (default)
PlotPattern( ExpressionSet = PhyloExpressionSetExample,
type = "l",
lwd = 6,
xlab = "Ontogeny",
ylab = "TAI" )
The p-value (p_flt
) above the TAI curve is returned by the FlatLineTest. As described in the documentation of PlotPattern()
(?PlotPattern
or ?FlatLineTest
), the FlatLineTest
is the default statistical test to quantify the statistical significance of the observed phylotranscriptomic pattern. In detail, the test quantifies any statistically significant deviation of the phylotranscriptomic pattern from a flat line.
In case the observed phylotranscriptomic pattern not only significantly deviates from a flat line but also visually resembles and hourglass shape, one can obtain a p-value corresponding to the statistical significance of a visual hourglass pattern referred to as ReductiveHourglassTest
(?ReductiveHourglassTest
).
Since the ReductiveHourglassTest
has been defined for a priori biological knowledge (Drost et al., 2015), the modules
argument within the ReductiveHourglassTest()
function needs to be specified.
Three modules need to be specified: an early-module, phylotypic module (mid), and a late-module.
For this example we divide A. thaliana embryo development stored within the PhyloExpressionSetExample into the following three modules:
# Plot the Transcriptome Age Index of a given PhyloExpressionSet
# Test Statistic : Reductive Hourglass Test
PlotPattern( ExpressionSet = PhyloExpressionSetExample,
TestStatistic = "ReductiveHourglassTest",
modules = list(early = 1:2, mid = 3:5, late = 6:7),
type = "l",
lwd = 6,
xlab = "Ontogeny",
ylab = "TAI" )
The corresponding p-value p_rht
now denotes the p-value returned by the ReductiveHourglassTest
which is different from the p-value returned by the FlatLineTest
(p_flt
).
To make sure that you selected the correct modules you can use the shaded.area
argument to visualize the chosen modules:
# Visualize the phylotypic period used for the Reductive Hourglass Test
PlotPattern( ExpressionSet = PhyloExpressionSetExample,
TestStatistic = "ReductiveHourglassTest",
modules = list(early = 1:2, mid = 3:5, late = 6:7),
shaded.area = TRUE,
type = "l",
lwd = 6,
xlab = "Ontogeny",
ylab = "TAI" )
Note, defining a priori knowledge for the ReductiveHourglassTest
using the modules
argument, modules need to start at stage 1, …, N and do not correspond to the column position in the PhyloExpressionSet/DivergenceExpressionSet which in contrast would start at position 3, … N.
The third test statistic that is implemented in the myTAI
package is the EarlyConservationTest
.
The EarlyConservationTest
tests whether an observed phylotranscriptomic pattern follows a low-high-high pattern (monotonically increasing function) supporting the Early Conservation Model of embryogenesis.
Analogous to the ReductiveHourglassTest
, the EarlyConservationTest
needs a priori biological knowledge (Drost et al., 2015). So again three modules
have to be specified for the EarlyConservationTest()
function.
Three modules need to be specified: an early-module, phylotypic module (mid), and a late-module.
For this example we divide A. thaliana embryo development stored within the PhyloExpressionSetExample into the following three modules:
# Plot the Transcriptome Age Index of a given PhyloExpressionSet
# Test Statistic : Reductive Early Conservation Test
PlotPattern( ExpressionSet = PhyloExpressionSetExample,
TestStatistic = "EarlyConservationTest",
modules = list(early = 1:2, mid = 3:5, late = 6:7),
type = "l",
lwd = 6,
xlab = "Ontogeny",
ylab = "TAI" )
The corresponding p-value p_ect
now denotes the p-value returned by the EarlyConservationTest
which is different from the p-value returned by the FlatLineTest
(p_flt
) and ReductiveHourglassTest
(p_rht
).
Since the present TAI pattern of the PhyloExpressionSetExample doesn’t support the Early Conservation Hypothesis, the p-value p_ect
= 1.
Also note, that defining a priori knowledge for the EarlyConservationTest
using the modules
argument, modules need to start at stage 1, …, N and do not correspond to the column position in the PhyloExpressionSet/DivergenceExpressionSet which in contrast would start at position 3, … N.
To obtain the numerical TAI values, the TAI()
function can be used:
# Compute the Transcriptome Age Index values of a given PhyloExpressionSet
TAI(PhyloExpressionSetExample)
Zygote Quadrant Globular Heart Torpedo Bent Mature
3.229942 3.225614 3.107135 3.116693 3.073993 3.176511 3.390334
Analogous to the TAI computations and visualization, the TDI computations can be performed in a similar fashion:
# Plot the Transcriptome Divergence Index of a given DivergenceExpressionSet
# Test Statistic : Flat Line Test (default)
PlotPattern( ExpressionSet = DivergenceExpressionSetExample,
type = "l",
lwd = 6,
xlab = "Ontogeny",
ylab = "TDI" )
Again, for the ReductiveHourglassTest we divide A. thaliana embryo development into the three modules:
# Plot the Transcriptome Divergence Index of a given DivergenceExpressionSet
# Test Statistic : Reductive Hourglass Test
PlotPattern( ExpressionSet = DivergenceExpressionSetExample,
TestStatistic = "ReductiveHourglassTest",
modules = list(early = 1:2, mid = 3:5, late = 6:7),
type = "l",
lwd = 6,
xlab = "Ontogeny",
ylab = "TDI" )
And for the EarlyConservationTest we again divide A. thaliana embryo development into the three modules:
# Plot the Transcriptome Divergence Index of a given DivergenceExpressionSet
# Test Statistic : Reductive Early Conservation Test
PlotPattern( ExpressionSet = DivergenceExpressionSetExample,
TestStatistic = "EarlyConservationTest",
modules = list(early = 1:2, mid = 3:5, late = 6:7),
type = "l",
lwd = 6,
xlab = "Ontogeny",
ylab = "TDI" )
To obtain the numerical TDI values for a given DivergenceExpressionSet simply run:
# Compute the Transcriptome Divergence Index values of a given DivergenceExpressionSet
TDI(DivergenceExpressionSetExample)
Zygote Quadrant Globular Heart Torpedo Bent Mature
4.532029 4.563200 4.485705 4.500868 4.466477 4.530704 4.690292
The TAI or TDI pattern are very useful to gain a first insight into the mean transcriptome age or mean sequence divergence of genes being most active during the corresponding developmental stage or experiment.
To further investigate the origins of the global TAI or TDI pattern it is useful to visualize the mean gene expression of each Phylostratum or Divergence-Stratum class.
Visualizing the mean gene expression of genes corresponding to the same Phylostratum or Divergence-Stratum class allows to detect development specific groups of Phylostrata or Divergence-Strata that are most expressed during the underlying developmental process. This might lead to correlating specific groups of Phylostrata or Divergence-Strata sharing similar evolutionary origins with common functions or functional contributions to a specific developmental process.
# Visualizing the mean gene expression of each Phylostratum class
PlotMeans( ExpressionSet = PhyloExpressionSetExample,
Groups = list(1:12),
legendName = "PS",
xlab = "Ontogeny",
lty = 1,
cex = 0.7,
lwd = 5 )
Here we see that the mean gene expression of Phylostratum group: PS1-3 (genes evolved before the establishment of embryogenesis in plants) are more expressed during A. thaliana embryogenesis than PS4-12 (genes evolved during or after the establishment of embryogenesis in plants).
In different developmental processes different Phylostratum groups or combination of groups might resemble the majority of expressed genes.
So the PlotMeans()
function takes an PhyloExpressionSet or DivergenceExpressionSet and plots for each Phylostratum the mean expression levels of all genes corresponding to this Phylostratum. The Groups
argument takes a list storing the Phylostrata (classified into the same group) that shall be visualized on the same plot.
For this example we separate groups of Phylostrata into evolutionary old Phylostrata (PS1-3) in one plot versus evolutionary younger Phylostrata (PS4-12) into another plot:
# Visualizing the mean gene expression of each Phylostratum class
# in two separate plots (groups)
PlotMeans( ExpressionSet = PhyloExpressionSetExample,
Groups = list(group_1 = 1:3, group_2 = 4:12),
legendName = "PS",
xlab = "Ontogeny",
lty = 1,
cex = 0.7,
lwd = 5 )
To obtain the numerical values (mean expression levels for all Phylostrata) run:
# Using the age.apply() function to compute the mean expression levels
# of all Phylostrata
age.apply( ExpressionSet = PhyloExpressionSetExample,
FUN = colMeans )
Zygote Quadrant Globular Heart Torpedo Bent Mature
1 2607.882 2579.372 2604.856 2525.704 2554.825 2622.757 2696.331
2 2597.258 2574.745 2467.679 2388.045 2296.410 2243.716 2321.709
3 2528.272 2363.159 2019.436 2099.079 2155.642 2196.875 2855.866
4 1925.320 1887.078 1771.399 1787.175 1740.823 1867.981 2358.893
5 2378.883 2368.593 2061.729 2077.087 2076.693 2564.904 3157.761
6 1658.253 1697.242 1485.401 1462.613 1492.861 1631.741 2304.683
7 1993.321 1717.659 1480.525 1590.009 1545.078 1600.264 2385.409
8 1781.653 1670.106 1452.180 1414.052 1359.376 1816.718 2364.070
9 1758.119 1764.748 1708.815 1575.727 1388.920 1687.314 2193.930
10 2414.456 2501.390 2163.810 1938.060 1770.039 1993.032 2127.015
11 1999.163 2071.456 1702.779 1710.290 1662.099 1726.865 2501.443
12 2126.189 2036.804 1896.964 1909.578 1859.485 1995.732 2387.343
Here the age.apply()
function (?age.apply
) takes a function as argument that itself receives a data.frame
as argument (e.g. colMeans()
).
For a DivergenceExpressionSet run:
# Visualizing the mean gene expression of each Divergence-Stratum class
PlotMeans( ExpressionSet = DivergenceExpressionSetExample,
Groups = list(1:10),
legendName = "DS",
xlab = "Ontogeny",
lty = 1,
cex = 0.7,
lwd = 5 )
To obtain the numerical values (mean expression levels for all Divergence-Strata) run:
# Using the age.apply() function to compute the mean expression levels
# of all Divergence-Strata
age.apply( ExpressionSet = DivergenceExpressionSetExample,
FUN = colMeans )
Zygote Quadrant Globular Heart Torpedo Bent Mature
1 5222.189 5230.547 5254.464 4911.494 4807.936 4654.683 4277.490
2 3146.510 3020.156 2852.072 2807.367 2845.025 3002.967 3237.315
3 2356.008 2239.344 2257.539 2272.270 2360.816 2529.276 2912.164
4 2230.350 2180.706 2050.895 2049.035 2001.043 2127.165 2608.903
5 2014.600 1994.640 1884.899 1851.554 1858.913 1920.185 2210.391
6 2096.593 2018.440 1938.765 1961.828 1905.246 2005.523 2339.767
7 1836.290 1832.815 1734.319 1719.186 1659.044 1736.141 2201.981
8 1784.470 1762.151 1635.529 1624.682 1590.489 1711.439 1983.607
9 1649.254 1659.455 1522.214 1485.560 1453.689 1584.176 1767.276
10 1660.750 1735.086 1605.275 1473.854 1398.067 1438.258 1541.633
Introduced by Domazet-Loso and Tautz (2010), relative expression levels are defined as a linear transformation of the mean expression levels (of each Phylostratum or Divergence-Stratum) into the interval [0,1] (Quint et al. (2012)). This allows the comparability between mean expression patterns between Phylostrata or Divergence-Strata independent from their actual magnitude.
The PlotRE()
function can be used (analogous to the PlotMeans()
function) to visualize the relative expression levels of a given PhyloExpressionSet and DivergenceExpressionSet:
# Visualizing the mean gene expression of each Phylostratum class
PlotRE( ExpressionSet = PhyloExpressionSetExample,
Groups = list(1:10),
legendName = "PS",
xlab = "Ontogeny",
lty = 1,
cex = 0.7,
lwd = 5 )
# Visualizing the mean gene expression of each Divergence-Stratum class
PlotRE( ExpressionSet = DivergenceExpressionSetExample,
Groups = list(1:10),
legendName = "DS",
xlab = "Ontogeny",
lty = 1,
cex = 0.7,
lwd = 5 )
or again by assigning Phylostratum or Divergence-Stratum groups that shall be visualized in different plots:
# Visualizing the mean gene expression of each Phylostratum class
PlotRE( ExpressionSet = PhyloExpressionSetExample,
Groups = list(group_1 = 1:3, group_2 = 4:12),
legendName = "PS",
xlab = "Ontogeny",
lty = 1,
cex = 0.7,
lwd = 5 )
The relative expression levels can be obtained using the REMatrix()
function:
# Getting the relative expression levels for all Phylostrata
REMatrix(PhyloExpressionSetExample)
Zygote Quadrant Globular Heart Torpedo Bent Mature
1 0.4816246 0.3145330 0.46389184 0.00000000 0.17067495 0.56880234 1.0000000
2 1.0000000 0.9363209 0.63348381 0.40823711 0.14904726 0.00000000 0.2206063
3 0.6083424 0.4109402 0.00000000 0.09521758 0.16284114 0.21213845 1.0000000
4 0.2985050 0.2366309 0.04946941 0.07499453 0.00000000 0.20573325 1.0000000
5 0.2893657 0.2799777 0.00000000 0.01401191 0.01365328 0.45908792 1.0000000
6 0.2323316 0.2786335 0.02706119 0.00000000 0.03592044 0.20084761 1.0000000
7 0.5666979 0.2620602 0.00000000 0.12099252 0.07133814 0.13232551 1.0000000
8 0.4203039 0.3092784 0.09237036 0.05442042 0.00000000 0.45520558 1.0000000
9 0.4586261 0.4668613 0.39738003 0.23205534 0.00000000 0.37067096 1.0000000
10 0.8811321 1.0000000 0.53841500 0.22974016 0.00000000 0.30490542 0.4881046
11 0.4015809 0.4877111 0.04846721 0.05741594 0.00000000 0.07716367 1.0000000
12 0.5052572 0.3359211 0.07100055 0.09489782 0.00000000 0.25811214 1.0000000
# Getting the relative expression levels for all Divergence-Strata
REMatrix(DivergenceExpressionSetExample)
Zygote Quadrant Globular Heart Torpedo Bent Mature
1 0.9669643 0.9755188 1.00000000 0.64894653 0.54294759 0.3860827 0.0000000
2 0.7888009 0.4949178 0.10397567 0.00000000 0.08758660 0.4549387 1.0000000
3 0.1733953 0.0000000 0.02704324 0.04893726 0.18054185 0.4309208 1.0000000
4 0.3772372 0.2955661 0.08201140 0.07895260 0.00000000 0.2074848 1.0000000
5 0.4543752 0.3987496 0.09292474 0.00000000 0.02050713 0.1912595 1.0000000
6 0.4403615 0.2605017 0.07713944 0.13021586 0.00000000 0.2307754 1.0000000
7 0.3264585 0.3200581 0.13864386 0.11077270 0.00000000 0.1420009 1.0000000
8 0.4934416 0.4366671 0.11457069 0.08697865 0.00000000 0.3076689 1.0000000
9 0.6236387 0.6561674 0.21851855 0.10163374 0.00000000 0.4161087 1.0000000
10 0.7794318 1.0000000 0.61482564 0.22487531 0.00000000 0.1192539 0.4259882
The same could also be done using the age.apply()
function in combination with the RE()
function:
# Getting the relative expression levels for all Phylostrata
age.apply( ExpressionSet = PhyloExpressionSetExample,
FUN = RE )
Zygote Quadrant Globular Heart Torpedo Bent Mature
1 0.4816246 0.3145330 0.46389184 0.00000000 0.17067495 0.56880234 1.0000000
2 1.0000000 0.9363209 0.63348381 0.40823711 0.14904726 0.00000000 0.2206063
3 0.6083424 0.4109402 0.00000000 0.09521758 0.16284114 0.21213845 1.0000000
4 0.2985050 0.2366309 0.04946941 0.07499453 0.00000000 0.20573325 1.0000000
5 0.2893657 0.2799777 0.00000000 0.01401191 0.01365328 0.45908792 1.0000000
6 0.2323316 0.2786335 0.02706119 0.00000000 0.03592044 0.20084761 1.0000000
7 0.5666979 0.2620602 0.00000000 0.12099252 0.07133814 0.13232551 1.0000000
8 0.4203039 0.3092784 0.09237036 0.05442042 0.00000000 0.45520558 1.0000000
9 0.4586261 0.4668613 0.39738003 0.23205534 0.00000000 0.37067096 1.0000000
10 0.8811321 1.0000000 0.53841500 0.22974016 0.00000000 0.30490542 0.4881046
11 0.4015809 0.4877111 0.04846721 0.05741594 0.00000000 0.07716367 1.0000000
12 0.5052572 0.3359211 0.07100055 0.09489782 0.00000000 0.25811214 1.0000000
Quint et al. (2012) introduced an additional way of visualizing the difference of relative expression levels between groups of Phylostrata/Divergence-Strata.
This bar plot comparing the mean relative expression levels of one Phylostratum/Divergence-Stratum group with all other groups can be plotted analogous to the PlotMeans()
and PlotRE()
functions:
# Visualizing the mean relative expression of two Phylostratum groups
PlotBarRE( ExpressionSet = PhyloExpressionSetExample,
Groups = list(group_1 = 1:3, group_2 = 4:12),
xlab = "Ontogeny",
ylab = "Mean Relative Expression",
cex = 2)
Here the argument Groups = list(1:3, 4:12)
corresponds to dividing Phylostrata 1-12 into Phylostratum groups defined as origin before embryogenesis (group one: PS1-3) and origin during or after embryogenesis (group two: PS4-12). A Kruskal-Wallis Rank Sum Test is then performed to test the statistical significance of the different bars that are compared. The ’*’ corresponds to a statistically significant difference.
Additionally the ratio between both values represented by the bars to be compared can be visualized as function within the bar plot using the ratio = TRUE
argument:
# Visualizing the mean relative expression of two Phylostratum groups
PlotBarRE( ExpressionSet = PhyloExpressionSetExample,
Groups = list(group_1 = 1:3, group_2 = 4:12),
ratio = TRUE,
xlab = "Ontogeny",
ylab = "Mean Relative Expression",
cex = 2 )
It is also possible to compare more than two groups:
# Visualizing the mean relative expression of three Phylostratum groups
PlotBarRE( ExpressionSet = PhyloExpressionSetExample,
Groups = list(group_1 = 1:3, group_2 = 4:6, group_3 = 7:12),
wLength = 0.05,
xlab = "Ontogeny",
ylab = "Mean Relative Expression",
cex = 2 )
For the corresponding statistically significant stages, a Posthoc test can be performed to detect the combinations of differing bars that cause the global statistical significance.
Most visualization functions in the myTAI
package have an ellipsis argument (?dotsMethods
) which allows you to modify the cex.lab
, cex.axis
, and cex
arguments within the plot.
# cex = 0.5, cex.lab = 0.5, cex.axis = 0.5
PlotPattern( ExpressionSet = PhyloExpressionSetExample,
type = "l",
lwd = 6,
xlab = "Ontogeny",
ylab = "TAI",
cex = 0.5,
cex.lab = 0.5,
cex.axis = 0.5 )
The cex
argument modifies the the p-value font size, the cex.lab
argument modifies the font size of the x-axis and y-axis labels, and the cex.axis
argument modifies the font size of the axis labels.
More examples:
# cex = 2, cex.lab = 1, cex.axis = 0.5
PlotPattern( ExpressionSet = PhyloExpressionSetExample,
type = "l",
lwd = 6,
xlab = "Ontogeny",
ylab = "TAI",
cex = 2,
cex.lab = 1,
cex.axis = 0.5 )
# cex = 0.5, cex.lab = 0.7, cex.axis = 1.5
PlotPattern( ExpressionSet = PhyloExpressionSetExample,
type = "l",
lwd = 6,
xlab = "Ontogeny",
ylab = "TAI",
cex = 0.5,
cex.lab = 0.7,
cex.axis = 1.5 )
These arguments can analogously be used for all other plot functions, e.g. PlotMeans()
, PlotRE()
, etc.