The package MORSE is devoted to the analysis of data from standard toxicity bioassays. It provides a simple workflow to explore/visualize a dataset, and compute estimations of risk assessment indicators. This document illustrates a typical use of MORSE on survival and reproduction data, which can be followed to analyze new datasets.

# Survival data analysis at target time

The following example shows all the stages to perform survival analysis on standard bioassay data and produce estimate values of the $$LC_x$$. We will use a dataset of the library named cadmium2, which contains both survival and reproduction data from a chronic laboratory bioassay. In this experiment, snails were exposed to six concentrations of a metal contaminant (cadmium) during 56 days.

## Step 1: check the structure and the dataset integrity

The data from a survival assay should be gathered in a data.frame with a specific layout. This is documented in the paragraph on survData in the reference manual, and you can also inspect one of the datasets provided in the package (like cadmium2). First, we load the dataset and use the function survDataCheck to verify that it has the expected layout:

data(cadmium2)
survDataCheck(cadmium2)
## No error detected.

## Step 2: create a survData object

The class survData represents survival data and is the basic representation used for the subsequent operations. Note that if the call to survDataCheck reports no error, it is guaranteed that survData will not fail.

dat <- survData(cadmium2)
head(dat)
##   replicate conc time Nsurv Nrepro Ninit Nindtime
## 1         A    0    0     5      0     5        0
## 2         A    0    3     5    262     5       15
## 3         A    0    7     5    343     5       35
## 4         A    0   10     5    459     5       50
## 5         A    0   14     5    328     5       70
## 6         A    0   17     5    742     5       85

## Step 3: visualize your dataset

The function plot can be used to plot the number of surviving individuals as a function of time for all concentrations and replicates.

plot(dat, style = "ggplot", pool.replicate = FALSE)

Two graphical styles are available, "generic" for standar R plots or "ggplot" to call the package ggplot2. If the argument pool.replicate is TRUE the datapoints for a given time and concentration are pooled and only the mean number of survivors is plotted. To observe the full dataset, we set this option to FALSE.

By fixing the concentration to a (tested) value, we can visualize one subplot in particular:

plot(dat, concentration = 124, addlegend = TRUE,
pool.replicate = FALSE, style = "ggplot")

We can also plot the survival rate with the confidence intervale on the data at a given time point as a function of the concentration, by using the function plot.DoseResponse and by fixing target.time:

plotDoseResponse(dat, target.time = 21, style = "ggplot", addlegend = TRUE)

The function summary provides some descriptive statistics on the experimental design.

summary(dat)
##
## Number of replicate per time and concentration:
##      time
## conc  0 3 7 10 14 17 21 24 28 31 35 38 42 45 49 52 56
##   0   6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   53  6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   78  6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   124 6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   232 6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   284 6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##
## Number of survival (sum of replicate) per time and concentration:
##      0  3  7 10 14 17 21 24 28 31 35 38 42 45 49 52 56
## 0   30 30 30 30 29 29 29 29 29 28 28 28 28 28 28 28 28
## 53  30 30 29 29 29 29 29 29 29 29 28 28 28 28 28 28 28
## 78  30 30 30 30 30 30 29 29 29 29 29 29 29 29 29 27 27
## 124 30 30 30 30 30 29 28 28 27 26 25 23 21 18 11 11  9
## 232 30 30 30 22 18 18 17 14 13 12  8  4  3  1  0  0  0
## 284 30 30 15  7  4  4  4  2  2  1  1  1  1  1  1  0  0

## Step 4: fit an exposure-response model to the survival data at target time

Now we are ready to fit a probabilistic model for the survival data, which aims at finding the relation between concentration of pollutant and mean survival rate at the target time. Our model assumes the latter is a log-logistic function of the former, and the work here is to estimate the parameters of this log-logistic function. Once we have estimated the parameters, we can then calculate the $$LC_x$$ for any $$x$$. All this work is performed by the survFitTT function, which requires a survData object and the levels of $$LC_x$$ we need:

fit <- survFitTT(dat,
target.time = 21,
lcx = c(10, 20, 30, 40, 50))

The returned value is an object of class survFitTT and provides the estimated parameters as a posterior1 distribution, in order to model our uncertainty on their true value. For the parameters of the models, as well as the $$LC_x$$ values, we report the median (as the estimated value) and the 2.5 % and 97.5 % quantiles of the posterior (as a measure of uncertainty, a.k.a. credible intervals). These can seen using the summary method:

summary(fit)
## Summary:
##
## The  loglogisticbinom_3  model with a binomial stochastic part was used !
##
## Priors on parameters (quantiles):
##
##         50%      2.5%     97.5%
## b 1.000e+00 1.259e-02 7.943e+01
## d 5.000e-01 2.500e-02 9.750e-01
## e 1.227e+02 5.390e+01 2.793e+02
##
## Posterior of the parameters (quantiles):
##
##         50%      2.5%     97.5%
## b 8.533e+00 3.979e+00 1.625e+01
## d 9.567e-01 9.091e-01 9.866e-01
## e 2.363e+02 2.099e+02 2.538e+02
##
## Posterior of the LCx (quantiles):
##
##            50%      2.5%     97.5%
## LC10 1.828e+02 1.259e+02 2.139e+02
## LC20 2.011e+02 1.533e+02 2.259e+02
## LC30 2.142e+02 1.744e+02 2.351e+02
## LC40 2.254e+02 1.925e+02 2.437e+02
## LC50 2.363e+02 2.099e+02 2.538e+02

If the inference went well, it is expected that the difference between quantiles in the posterior will be reduced compared to the prior, meaning that the data was helpful to reduce the uncertainty on the true value of the parameters. This simple check can be performed using the summary function.

The fit can also be plotted:

plot(fit, log.scale = TRUE, style = "ggplot", adddata = TRUE,
addlegend = TRUE)

This representation shows the estimated relation between concentration of pollutant and survival rate (red curve). It is computed by choosing for each parameter the median value of its posterior. To assess the uncertainty on this estimation, we compute many such curves this time by sampling the parameters in the posterior distribution. This gives rise to the pink band, showing for any given concentration an interval (called credible interval) containing the survival rate 95% of the time in the posterior distribution. The experimental data points are represented in black and correspond to the observed survival rate when pooling all replicates. The black error bars correspond to a 95% confidence interval, which is another, more straightforward way to bound the most probable value of the survival rate for a tested concentration. In favorable situations, we expect that the credible interval around the estimated curve and the confidence interval around experimental points will largely overlap.

Note that survFitTT will warn you if the estimated $$LC_{50}$$ lies outside the range of tested concentrations, as in the following example:

data("cadmium1")
target.time = 21,
lcx = c(10, 20, 30, 40, 50))
## Warning: The LC50 estimation (model parameter e) lies outside the range
## of tested concentration and may be unreliable as the prior distribution on
## this parameter is defined from this range !
plot(wrong_fit, log.scale = TRUE, style = "ggplot", adddata = TRUE,
addlegend = TRUE)

In this example, the experimental design did not include sufficiently high concentrations, and we are missing measurements that would have a major influence on the final estimation. For this reason this result should be considered unreliable.

## Step 5: validate the model with a posterior predictive check

The estimation can be further validated using so-called posterior predictive checks: the idea is to plot the observed values against the corresponding estimated predictions, along with their 95% credible interval. If the fit is correct, we expect to see 95% of the data inside the intervals.

ppc(fit, style = "ggplot")

In this plot, each black dot represents an observation made at some concentration, and the number of survivors at target time is given by the value on X-axis. Using the concentration and the fitted model, we can produce a corresponding prediction of the expected number of survivors at that concentration. This prediction is given by the Y-axis. Ideally observations and predictions should coincide, so we’d expect to see the black dots on the points of coordinate $$Y = X$$. Our model provides a tolerable variation around the predited mean value as an interval where we expect 95% of the dots to be in average. The intervals are represented in green if they overlap with the line $$Y=X$$, and in red otherwise. As replicates are shifted on the x-axis, the bisecting line (y = x), is represented by steps, and is added to the plot in order to see if each prediction interval contains each observed value.

# Survival analysis using a toxico-kinetic toxico-dynamic model

The steps for TKTD data analysis are absolutely analogous to what we described for the analysis at target time. Here the goal is to estimate the relation between pollutant concentration, time and survival rate.

Here is a typical session:

# (1) load dataset
data(propiconazole)

# (2) check structure and integrity of the dataset
survDataCheck(propiconazole)
## No error detected.
# (3) create a survData object
dat <- survData(propiconazole)

# (4) represent the number of survivor as a function time
plot(dat, style = "ggplot", pool.replicate = FALSE)

# (5) check information on the experimental design
summary(dat)
##
## Number of replicate per time and concentration:
##     time
## conc 0 1 2 3 4
##   0  1 1 1 1 1
##   8  2 2 2 2 2
##   12 2 2 2 2 2
##   14 2 2 2 2 2
##   18 2 2 2 2 2
##   24 2 2 2 2 2
##   29 2 2 2 2 2
##   36 2 2 2 2 2
##
## Number of survival (sum of replicate) per time and concentration:
##     0  1  2  3  4
## 0  10  9  9  9  9
## 8  20 20 20 20 19
## 12 20 19 19 19 17
## 14 20 19 19 18 16
## 18 21 21 20 16 16
## 24 20 17  6  2  1
## 29 20 11  4  0  0
## 36 20 11  1  0  0
# (6) fit a TKTD model
fit <- survFitTKTD(dat, quiet = TRUE)
summary(fit)
## Summary:
##
## Priors on parameters (quantiles):
##
##           50%      2.5%     97.5%
## kd  4.157e-02 2.771e-04 6.236e+00
## ks  5.555e-03 1.016e-05 3.037e+00
## nec 1.697e+01 8.121e+00 3.546e+01
## m0  1.317e-02 2.708e-04 6.403e-01
##
## Posterior of the parameters (quantiles):
##
##           50%      2.5%     97.5%
## kd  2.213e+00 1.593e+00 3.535e+00
## ks  1.249e-01 7.964e-02 1.921e-01
## nec 1.707e+01 1.560e+01 1.891e+01
## m0  3.056e-02 1.440e-02 5.524e-02
plot(fit, style = "ggplot", adddata = TRUE, one.plot = FALSE)

ppc(fit, style = "ggplot")

# Reproduction data analyses

The steps in reproduction data analysis are absolutely analogous to what we described for survival data. This time the goal is to estimate the relation between pollutant concentration and reproduction rate per individual-day.

Here is a typical session:

# (1) load dataset

# (2) check structure and integrity of the dataset
reproDataCheck(cadmium2)
## No error detected.
# (3) create a reproData object

# (4) represent the cumulated number of offspring as a function time
plot(dat, style = "ggplot", pool.replicate = FALSE)

plot(dat, concentration = 124, addlegend = TRUE, style = "ggplot",
pool.replicate = FALSE)

# (5) represent the reproduction rate as a function of concentration
plotDoseResponse(dat, target.time = 56, style = "ggplot")

# (6) check information on the experimental design
summary(dat)
##
## Number of replicate per time and concentration:
##      time
## conc  0 3 7 10 14 17 21 24 28 31 35 38 42 45 49 52 56
##   0   6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   53  6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   78  6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   124 6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   232 6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##   284 6 6 6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
##
## Number of survival (sum of replicate) per time and concentration:
##      0  3  7 10 14 17 21 24 28 31 35 38 42 45 49 52 56
## 0   30 30 30 30 29 29 29 29 29 28 28 28 28 28 28 28 28
## 53  30 30 29 29 29 29 29 29 29 29 28 28 28 28 28 28 28
## 78  30 30 30 30 30 30 29 29 29 29 29 29 29 29 29 27 27
## 124 30 30 30 30 30 29 28 28 27 26 25 23 21 18 11 11  9
## 232 30 30 30 22 18 18 17 14 13 12  8  4  3  1  0  0  0
## 284 30 30 15  7  4  4  4  2  2  1  1  1  1  1  1  0  0
##
## Number of offspring (sum of replicate) per time and concentration:
##     0    3    7   10   14   17   21   24   28   31   35   38   42   45
## 0   0 1659 1587 2082 1580 2400 2069 2316 1822 2860 2154 3200 1603 2490
## 53  0 1221 1567 1710 1773 1859 1602 1995 1800 2101 1494 2126  935 1629
## 78  0 1066 2023 1752 1629 1715 1719 1278 1717 1451 1826 1610 1097 1727
## 124 0  807 1917 1423  567  383  568  493  605  631  573  585  546  280
## 232 0  270 1153  252   30    0   37   28   46  119   19    9    0    0
## 284 0  146  275   18    1    0    0    0    0    0    0    0    0    0
##       49   52   56
## 0   1609 2149 2881
## 53  2108 1686 1628
## 78  2309 1954 1760
## 124  594  328  380
## 232    0    0    0
## 284    0    0    0
# (7) fit an exposure-response model at target-time
fit <- reproFitTT(dat, stoc.part = "bestfit",
target.time = 21,
ecx = c(10, 20, 30, 40, 50),
quiet = TRUE)
summary(fit)
## Summary:
##
## The  loglogistic  model with a Gamma Poisson stochastic part was used !
##
## Priors on parameters (quantiles):
##
##             50%      2.5%     97.5%
## b     1.000e+00 1.259e-02 7.943e+01
## d     1.830e+01 1.554e+01 2.107e+01
## e     1.488e+02 7.902e+01 2.804e+02
## omega 1.000e+00 1.585e-04 6.310e+03
##
## Posterior of the parameters (quantiles):
##
##             50%      2.5%     97.5%
## b     3.835e+00 2.803e+00 5.643e+00
## d     1.779e+01 1.552e+01 2.020e+01
## e     1.366e+02 1.133e+02 1.711e+02
## omega 1.432e+00 8.232e-01 2.779e+00
##
## Posterior of the ECx (quantiles):
##
##            50%      2.5%     97.5%
## EC10 7.695e+01 5.343e+01 1.135e+02
## EC20 9.497e+01 7.082e+01 1.319e+02
## EC30 1.093e+02 8.518e+01 1.459e+02
## EC40 1.227e+02 9.893e+01 1.588e+02
## EC50 1.366e+02 1.133e+02 1.711e+02
plot(fit, log.scale = TRUE, style = "ggplot", adddata = TRUE,
addlegend = TRUE)
ppc(fit, style = "ggplot")

As in survival analysis, we assume that the reproduction rate per individual-day is a log-logistic function of the concentration. More details and parameter signification can be found in the Modeling vignette.

## Model comparison

For reproduction analysis, we consider one model which neglects inter-individual variability (named “Poisson”) and another one which takes it into account (named “gamma Poisson”). You can choose either one using the option stoc.part, but by setting it to "bestfit", you let reproFitTT decide which models fits the data best. The choice can be seen by calling the summary function:

summary(fit)
## Summary:
##
## The  loglogistic  model with a Gamma Poisson stochastic part was used !
##
## Priors on parameters (quantiles):
##
##             50%      2.5%     97.5%
## b     1.000e+00 1.259e-02 7.943e+01
## d     1.830e+01 1.554e+01 2.107e+01
## e     1.488e+02 7.902e+01 2.804e+02
## omega 1.000e+00 1.585e-04 6.310e+03
##
## Posterior of the parameters (quantiles):
##
##             50%      2.5%     97.5%
## b     3.835e+00 2.803e+00 5.643e+00
## d     1.779e+01 1.552e+01 2.020e+01
## e     1.366e+02 1.133e+02 1.711e+02
## omega 1.432e+00 8.232e-01 2.779e+00
##
## Posterior of the ECx (quantiles):
##
##            50%      2.5%     97.5%
## EC10 7.695e+01 5.343e+01 1.135e+02
## EC20 9.497e+01 7.082e+01 1.319e+02
## EC30 1.093e+02 8.518e+01 1.459e+02
## EC40 1.227e+02 9.893e+01 1.588e+02
## EC50 1.366e+02 1.133e+02 1.711e+02

When the gamma Poisson model is selected, the summary shows an additional parameter called omega, which quantifies the inter-individual variability (the higher omega the higher the variability).

## Reproduction data and survival functions

In MORSE, reproduction datasets are a special case of survival datasets: a reproduction dataset includes the same information than a survival dataset plus the information on reproduction outputs. For that reason the S3 class reproData inherits from the class survData, which means that any operation on a survData object is legal on a reproData object. In particular, to use the plot function related to survival analysis on reproData object, we can use the as.survData conversion function:

dat <- reproData(cadmium2)
plot(as.survData(dat))

1. In Bayesian inference, the parameters of a model are estimated from the data starting from a so-called prior, which is a probability distribution representing an initial guess on the true parameters, before seing the data. The posterior distribution represents the uncertainty on the parameters after seeing the data and combining it with the prior. To obtain a point estimate of the parameters, it is typical to compute the mean or median of the posterior and quantify the uncertainty by reporting the standard deviation or inter-quantile distance.