The package MORSE is devoted to the analysis of data from standard toxicity bioassays. It provides a simple workflow to explore/visualize a dataset, and compute estimations of risk assessment indicators. This document illustrates a typical use of MORSE on survival and reproduction data, which can be followed to analyze new datasets.

Survival data analysis

The following example shows all the stages to perform survival analysis on standard bioassay data and produce estimate values of the \(LC_x\). We will use a dataset of the library named cadmium2, which contains both survival and reproduction data from a chronic laboratory bioassay. In this experiment, snails were exposed to six concentrations of a metal contaminant (cadmium) during 56 days.

Step 1: check the structure and the dataset integrity

The data from a survival assay should be gathered in a data.frame with a specific layout. This is documented in the paragraph on survData in the reference manual, and you can also inspect one of the datasets provided in the package (like cadmium2). First, we load the dataset and use the function survDataCheck to verify that it has the expected layout:

data(cadmium2)
survDataCheck(cadmium2, diagnosis.plot = TRUE)
## No error detected.

Step 2: create a survData object

The class survData represents survival data and is the basic representation used for the subsequent operations. Note that if the call to survDataCheck reports no error, it is guaranteed that survData will not fail.

dat <- survData(cadmium2)
head(dat)
##   replicate conc time Nsurv Nrepro     ID Ninit Nindtime
## 1         A    0    0     5      0  A_0_0     5        0
## 2         A    0    3     5      4  A_0_3     5       15
## 3         A    0    7     5      5  A_0_7     5       35
## 4         A    0   10     5      8 A_0_10     5       50
## 5         A    0   14     5      4 A_0_14     5       70
## 6         A    0   17     5      9 A_0_17     5       85

Step 3: visualize your dataset

The function plot can be used to plot the number of surviving individuals as a function of time for all concentrations and replicates.

plot(dat, style = "ggplot", pool.replicate = FALSE)

Two graphical styles are available, "generic" for standar R plots or "ggplot" to call the package ggplot2. If the argument pool.replicate is TRUE the datapoints for a given time and concentration are pooled and only the mean number of survivors is plotted. To observe the full dataset, we set this option to FALSE.

By fixing the concentration to a (tested) value, we can visualize one subplot in particular:

plot(dat, concentration = 124, addlegend = TRUE,
     pool.replicate = FALSE, style = "ggplot")

We can also plot the number of surviving individuals at a given time point as a function of the concentration, by fixing target.time instead of concentration:

plot(dat, target.time = 21, pool.replicate = FALSE, style = "ggplot",
     addlegend = TRUE)

Finally, when fixing both concentration and target.time, we can compare one condition to the control:

plot(dat, concentration = 232, target.time = 21)

The function summary provides some descriptive statistics on the experimental design.

summary(dat)
## Number of datapoints per concentration: 
## 
##   0  53  78 124 232 284 
## 102 102 102 102 102 102 
## 
## Number of datapoints per time: 
## 
##  0  3  7 10 14 17 21 24 28 31 35 38 42 45 49 52 56 
## 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 
## 
## Number of replicate per time and concentration: 
##      time
## conc  0 3 7 10 14 17 21 24 28 31 35 38 42 45 49 52 56
##   0   6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   53  6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   78  6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   124 6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   232 6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   284 6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
## 
## Number of survival (sum of replicate) per time and concentration: 
##      0  3  7 10 14 17 21 24 28 31 35 38 42 45 49 52 56
## 0   30 30 30 30 29 29 29 29 29 28 28 28 28 28 28 28 28
## 53  30 30 29 29 29 29 29 29 29 29 28 28 28 28 28 28 28
## 78  30 30 30 30 30 30 29 29 29 29 29 29 29 29 29 27 27
## 124 30 30 30 30 30 29 28 28 27 26 25 23 21 18 11 11  9
## 232 30 30 30 22 18 18 17 14 13 12  8  4  3  1  0  0  0
## 284 30 30 15  7  4  4  4  2  2  1  1  1  1  1  1  0  0

Step 4: fit an exposure-response model to the survival data at target time

Now we are ready to fit a probabilistic model for the survival data, which aims at finding the relation between concentration of pollutant and mean survival rate at the target time. Our model assumes the latter is a log-logistic function of the former, and the work here is to estimate the parameters of this log-logistic function. Once we have estimated the parameters, we can then calculate the \(LC_x\) for any \(x\). All this work is performed by the survFitTT function, which requires a survData object and the levels of \(LC_x\) we need:

fit <- survFitTT(dat,
                lcx = c(10, 20, 30, 40, 50))

The return value is an object of class survFitTT and provides the estimated parameters as a probability distribution, in order to model our uncertainty on their true value. For the parameters of the models, as well as the \(LC_x\) values, we report the median (as the estimated value) and the 2.5 % and 97.5 % quantiles of the distribution (as a measure of uncertainty, a.k.a. credible intervals). These can seen using the summary method:

summary(fit)
## Summary: 
## 
## The  loglogisticbinom_3  model with a binomial stochastic part was used !
## 
## Quantiles of priors on parameters: 
## 
##       50%   2.5%   97.5%
## b   1.000  0.013  79.433
## d   0.500  0.025   0.975
## e 122.687 53.898 279.268
## 
## Quantiles of posteriors on parameters: 
## 
##       50%    2.5%   97.5%
## b  11.739   5.393  72.776
## d   0.922   0.853   0.971
## e 117.770 104.238 124.653
## 
## Quantiles of the estimated LCx: 
## 
##          50%    2.5%   97.5%
## LC10  97.123  72.530 119.243
## LC20 104.193  83.323 120.559
## LC30 109.269  91.322 121.575
## LC40 113.589  97.945 122.620
## LC50 117.770 104.238 124.653

or in a plot:

plot(fit, log.scale = TRUE, ci = TRUE, style = "ggplot",
     addlegend = TRUE)

Note that survFitTT will warn you if the estimated \(LC_{50}\) lies outside the range of tested concentrations:

data("cadmium1")
fit <- survFitTT(survData(cadmium1),
                lcx = c(10, 20, 30, 40, 50))
## Warning in survFitTT(survData(cadmium1), lcx = c(10, 20, 30, 40, 50)): The
## LC50 estimation lies outsides the range of tested concentration and may be
## unreliable !
plot(fit, log.scale = TRUE, ci = TRUE, style = "ggplot",
     addlegend = TRUE)

In this example, the experimental design did not include sufficiently high concentrations, and we are missing measurements that would have a major influence on the final estimation. For this reason this result should be considered unreliable.

Reproduction data analyses

The steps in reproduction data analysis are absolutely analogous to what we described for survival data. This time the goal is to estimate the relation between pollutant concentration and reproduction rate per individual-day.

Here is a typical session:

# (1) load dataset
data(cadmium2)

# (2) check structure and integrity of the dataset
reproDataCheck(cadmium2, diagnosis.plot = TRUE)
## No error detected.
# (3) create a `reproData` object
dat <- reproData(cadmium2)

# (4) represent the cumulated number of offspring as a function time
plot(dat, style = "ggplot", pool.replicate = FALSE)

plot(dat, target.time = 21, addlegend = TRUE, style = "ggplot",
     pool.replicate = FALSE)

plot(dat, concentration = 124, addlegend = TRUE, style = "ggplot",
     pool.replicate = FALSE)

plot(dat, concentration = 124, target.time = 21, style = "ggplot")

# (5) check information on the experimental design
summary(dat)
## Number of datapoints per concentration: 
## 
##   0  53  78 124 232 284 
## 102 102 102 102 102 102 
## 
## Number of datapoints per time: 
## 
##  0  3  7 10 14 17 21 24 28 31 35 38 42 45 49 52 56 
## 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 
## 
## Number of replicate per time and concentration: 
##      time
## conc  0 3 7 10 14 17 21 24 28 31 35 38 42 45 49 52 56
##   0   6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   53  6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   78  6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   124 6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   232 6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   284 6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
## 
## Number of survival (sum of replicate) per time and concentration: 
##      0  3  7 10 14 17 21 24 28 31 35 38 42 45 49 52 56
## 0   30 30 30 30 29 29 29 29 29 28 28 28 28 28 28 28 28
## 53  30 30 29 29 29 29 29 29 29 29 28 28 28 28 28 28 28
## 78  30 30 30 30 30 30 29 29 29 29 29 29 29 29 29 27 27
## 124 30 30 30 30 30 29 28 28 27 26 25 23 21 18 11 11  9
## 232 30 30 30 22 18 18 17 14 13 12  8  4  3  1  0  0  0
## 284 30 30 15  7  4  4  4  2  2  1  1  1  1  1  1  0  0
## 
## Number of offspring (sum of replicate) per time and concentration: 
##     0  3  7 10 14 17 21 24 28 31 35 38 42 45 49 52 56
## 0   0 29 25 33 24 32 29 29 22 38 29 39 23 30 18 33 34
## 53  0 20 26 27 27 25 23 27 22 28 20 28 12 20 26 21 20
## 78  0 17 31 28 24 23 24 15 21 22 23 27 19 24 36 32 26
## 124 0 14 32 24  9  6  9 10 12 12 16 16 12  7 14  7  8
## 232 0  5 20  6  2  0  5  1  5  4  2  3  0  0  0  0  0
## 284 0  2  6  1  1  0  0  0  0  0  0  0  0  0  0  0  0
# (6) fit an exposure-response model at target-time
fit <- reproFitTT(dat, stoc.part = "bestfit",
                  ecx = c(10, 20, 30, 40, 50),
                  quiet = TRUE)
plot(fit, log.scale = TRUE, ci = TRUE, 
     style = "ggplot", addlegend = TRUE)

As in survival analysis, we assume that the reproduction rate per individual-day is a log-logistic function of the concentration. More details and parameter signification can be found in the Modeling vignette.

Model comparison

For reproduction analysis, we consider one model which neglects inter-individual variability (named “Poisson”) and another one which takes it into account (named “gamma Poisson”). You can choose either one using the option stoc.part, but by setting it to “bestfit”, you let reproFitTT decide which models fits the data best. The choice can be seen by calling the summary function:

summary(fit)
## Summary: 
## 
## The  loglogistic  model with a Poisson stochastic part was used !
## 
## Quantiles of priors on parameters: 
## 
##       50%   2.5%   97.5%
## b   1.000  0.013  79.433
## d   0.290  0.262   0.319
## e 148.835 79.015 280.350
## 
## Quantiles of posteriors on parameters: 
## 
##       50%    2.5%   97.5%
## b   2.282   1.791   2.942
## d   0.281   0.256   0.307
## e 140.929 122.296 161.612
## 
## Quantiles of the estimated ECx:
## 
##          50%    2.5%   97.5%
## EC10  53.725  37.550  73.696
## EC20  76.673  58.591  97.947
## EC30  97.149  78.395 118.504
## EC40 117.990  99.150 138.968
## EC50 140.929 122.296 161.612

When the gamma Poisson model is selected, the summary shows an additional parameter called omega, which quantifies the inter-individual variability (the higher omega the higher variability).

Reproduction data and survival function

In MORSE, reproduction datasets are a special case of survival datasets: a reproduction dataset includes the same information than a survival dataset plus the information on reproduction outputs. For that reason the S3 class reproData inherits from the class survData, which means that any operation on a survData object is legal on a reproData object. In particular, to use the plot function related to survival analysis on reproData object, we can use the as.survData conversion function:

dat <- reproData(cadmium2)
plot(as.survData(dat))