We first load the necessary packages and set some pre-defined values needed to replicate the analysis.
nExp <- 4 # Dataset has 11 experiments, we consider only 4
cutoff <- 0.95 # Cutoff for p-values to use in plot.maxR() function
The data for the analysis must come in a data-frame with required columns d1
, d2
and effect
for doses of two compounds and observed cell counts respectively. The effect
column may represent also a type of normalized data and subsequent transformation functions should be adjusted.
We will use sample data included in the package - directAntivirals
.
## experiment cpd1 cpd2 effect d1 d2
## 1 1 cpd1_A cpd2_A 509.64 500 500
## 2 1 cpd1_A cpd2_A 589.67 500 500
## 3 1 cpd1_A cpd2_A 524.71 500 130
## 4 1 cpd1_A cpd2_A 575.45 500 130
## 5 1 cpd1_A cpd2_A 634.53 500 31
## 6 1 cpd1_A cpd2_A 702.35 500 31
This data consists of 11 experiments that can be processed separately. For initial illustration purposes we choose just one experiment and retain only the columns of interest. We define a simple function to do just that.
subsetData <- function(data, i) {
## Subset data to a single experiment and, optionaly, select the necessary
## columns only
subset(data, experiment == i)[, c("effect", "d1", "d2")]
}
Now let us only pick Experiment 1
to illustrate the functionality of the package.
Dose-response data for Experiment 1
will be used for a large share of the analysis presented here, therefore this subset is stored in a dataframe called data
. Later, we will run the analysis for some other experiments as well.
If raw data is measured in cell counts, data transformation might be of interest to improve accuracy and interpretation of the model. Of course, this will depend on the model specification. For example, if a generalized Loewe model is assumed on the growth rate of the cell count, the appropriate conversion should be made from the observed cell counts. The formula used would be \[y = N_0\exp\left(kt\right)\] where \(k\) is a growth rate, \(t\) is time (fixed) and \(y\) is the observed cell count. If such a transformation is specified, it is referred to as the biological transformation.
In certain cases, variance-stabilizing transformations (Box-Cox) can also be useful. We refer to these transformations as power transformations. In many cases, a simple logarithmic transformation can be sufficient but, if desired, a helper function optim.boxcox
is available to automate the selection of Box-Cox transformation parameters.
In addition to specifying biological and power transformations, users are also asked to specify their inverses. These are later used in the bootstrapping procedure and plotting methods.
As an example, we might define a transforms
list that will be passed to the fitting functions. It contains both biological growth rate and power transformations along with their inverses.
## Define forward and reverse transform functions
transforms <- list(
"BiolT" = function(y, args) with(args, N0*exp(y*time.hours)),
"InvBiolT" = function(T, args) with(args, 1/time.hours*log(T/N0)),
"PowerT" = function(y, args) with(args, log(y)),
"InvPowerT" = function(T, args) with(args, exp(T)),
"compositeArgs" = list(N0 = 1,
time.hours = 72)
)
compositeArgs
contains the initial cell counts (N0
) and incubation time (time.hours
). In certain cases, the getTransformations
wrapper function can be employed to automatically obtain a prepared list with biological growth rate and power transformations based on results from optim.boxcox
. Its output will also contain the inverses of these transforms.
transforms_auto <- getTransformations(data)
fitMarginals(data, transforms = transforms_auto)
## In the case of 1-parameter Box-Cox transformation, it is easy
## to retrieve the power parameter by evaluating the function at 0.
## If parameter is 0, then it is a log-transformation.
with(transforms_auto, -1 / PowerT(0, compositeArgs))
Once dose-response dataframe is correctly set up, we may proceed onto synergy analysis. We will use transforms
as defined above with a logarithmic transformation. If not desired, transforms
can be set to NULL
and would be ignored.
Synergy analysis is quite modular and is divided into 3 parts:
The first step of the fitting procedure will consist in treating marginal data only, i.e. those observations within the experiment where one of the compounds is dosed at zero. For each compound the corresponding marginal doses are modelled using a 4-parameter logistic model.
The marginal models will be estimated together using non-linear least squares estimation procedure. Estimation of both marginal models needs to be simultaneous since it is assumed they share a common baseline that also needs to be estimated. The fitMarginals
function and other marginal estimation routines will automatically extract marginal data from the dose-response data frame.
Before proceeding onto the estimation, we get a rough guess of the parameters to use as starting values in optimization and then we fit the model. marginalFit
, returned by the fitMarginals
routine, is an object of class MarginalFit
which is essentially a list containing the main information about the marginal models, in particular the estimated coeffcients.
The optional names
argument allows to specify the names of the compounds to be shown on the plots and in the summary. If not defined, the defaults (“Compound 1” and “Compound 2”) are used.
## Fitting marginal models
marginalFit <- fitMarginals(data, transforms = transforms, method = "nls",
names = c("Drug A", "Drug B"))
summary(marginalFit)
## Formula: effect ~ fn(h1, h2, b, m1, m2, e1, e2, d1, d2)
## Transformations: Yes
##
## Drug A Drug B
## Slope 2.132 1.201
## Maximal response 0.088 0.091
## log10(EC50) 1.008 0.977
##
## Common baseline at: 0.132
marginalFit
object retains the data that was supplied and the transformation functions used in the fitting procedure. It also has a plot
method which allows for a quick visualization of the fitting results.
## Plotting marginal models
plot(marginalFit) + ggtitle(paste("Direct-acting antivirals - Experiment" , i))
Note as well that the fitMarginals
function allows specifying linear constraints on parameters. This provides an easy way for the user to impose asymptote equality, specific baseline value and other linear constraints that might be useful. See help(constructFormula)
for more details.
## Parameter ordering: h1, h2, b, m1, m2, e1, e2
## Constraint 1: m1 = m2. Constraint 2: b = 10.
constraints <- list("matrix" = rbind(c(0, 0, 0, -1, 1, 0, 0),
c(0, 0, 1, 0, 0, 0, 0)),
"vector" = c(0, 10))
## Parameter estimates will now satisfy equality:
## constraints$matrix %*% pars == constraints$vector
fitMarginals(data, transforms = transforms,
constraints = constraints)
The fitMarginals
function allows an alternative user-friendly way to specify one or more fixed-value constraints using a named vector passed to the function via fixed
argument.
## Set baseline at 1 and maximal responses at 0.
fitMarginals(data, transforms = transforms,
fixed = c("m1" = 0, "m2" = 0, "b" = 1))
By default, no constraints are set, thus asymptotes are not shared and so a generalized Loewe model will be estimated.
We advise the user to employ the method = "nlslm"
argument which is set as the default in monotherapy curve estimation. It is based on minpack.lm::nlsLM
function with an underlying Levenberg-Marquardt algorithm for non-linear least squares estimation. This algorithm is known to be more robust than method = "nls"
and its Gauss-Newton algorithm. In cases with nice sigmoid-shaped data, both methods should however lead to similar results.
method = "optim"
is a simple sum-of-squared-residuals minimization driven by a default Nelder-Mead algorithm from optim
minimizer. It is typically slower than non-linear least squares based estimation and can lead to a significant increase in computational time for larger datasets and bootstrapped statistics. In nice cases, Nelder-Mead algorithm and non-linear least squares can lead to rather similar estimates but this is not always the case as these algorithms are based on different techniques.
In general, we advise that in automated batch processing whenever method = "nlslm"
does not converge fast enough and/or emits a warning, user should implement a fallback to method = "optim"
and re-do the estimation. If none of these suggestions work, it might be useful to fiddle around and slightly perturb starting values for the algorithms as well. By default, these are obtained from the initialMarginal
function.
nlslmFit <- tryCatch({
fitMarginals(data, transforms = transforms,
method = "nlslm")
}, warning = function(w) w, error = function(e) e)
if (inherits(nlslmFit, c("warning", "error")))
optimFit <- tryCatch({
fitMarginals(data, transforms = transforms,
method = "optim")
})
Note as well that additional arguments to fitMarginals
passed via ...
ellipsis argument will be passed on to the respective solver function, i.e. minpack.lm::nlsLM
, nls
or optim
.
While BIGL
package provides several routines to fit 4-parameter log-logistic dose-response models, some users may prefer to use their own optimizers to estimate the relevant parameters. It is rather easy to integrate this into the workflow by constructing a custom MarginalFit
object. It is in practice a simple list with
coef
: named vector with coefficient estimatessigma
: standard deviation of residualsdf
: degrees of freedom from monotherapy curve estimatesmodel
: model of the marginal estimation which allows imposing linear constraints on parameters. If no constraints are necessary, it can be left out or assigned the output of constructFormula
function with no inputs.shared_asymptote
: whether estimation is constrained to share the asymptote. During the estimation, this is deduced from model
object.method
: method used in dose-response curve estimation which will be re-used in bootstrappingtransforms
: power and biological transformation functions (and their inverses) used in monotherapy curve estimation. This should be a list in a format described above. If transforms
is unspecified or NULL
, no transformations will be used in statistical bootstrapping unless the user asks for it explicitly via one of the arguments to fitSurface
.Other elements in the MarginalFit
are currently unused for evaluating synergy and can be disregarded. These elements, however, might be necessary to ensure proper working of available methods for the MarginalFit
object.
As an example, the following code generates a custom MarginalFit
object that can be passed further to estimate a response surface under the null hypothesis.
marginalFit <- list("coef" = c("h1" = 1, "h2" = 2, "b" = 0,
"m1" = 1.2, "m2" = 1, "e1" = 0.5, "e2" = 0.5),
"sigma" = 0.1,
"df" = 123,
"model" = constructFormula(),
"shared_asymptote" = FALSE,
"method" = "nlslm",
"transforms" = transforms)
class(marginalFit) <- append(class(marginalFit), "MarginalFit")
Note that during bootstrapping this would use minpack.lm::nlsLM
function to re-estimate parameters from data following the null. A custom optimizer for bootstrapping is currently not implemented.
Three types of null models are available for calculating expected response surfaces.
shared_asymptote = FALSE
, in the marginal fitting procedure and null_model = "loewe"
in response calculationshared_asymptote = TRUE
in the marginal fitting procedure and null_model = "loewe"
in response calculationnull_model = "hsa"
irrespectively of the value of shared_asymptote
.If transformation functions were estimated using fitMarginals
, these will be automatically recycled from the marginalFit
object when doing calculations for the response surface fit. Alternatively, transformation functions can be passed by a separate argument. Since the marginalFit
object was estimated without the shared asymptote constraint, the following will compute the response surface based on the generalized Loewe model.
rs <- fitSurface(data, marginalFit,
## transforms = transforms,
null_model = "loewe",
B.CP = 50, statistic = "none", parallel = FALSE)
summary(rs)
Null model: Generalized Loewe Additivity
Variance assumption used: "equal"
Mean occupancy rate: 0.6732745
Formula: effect ~ fn(h1, h2, b, m1, m2, e1, e2, d1, d2)
Transformations: Yes
Drug A Drug B
Slope 2.132 1.201
Maximal response 0.088 0.091
log10(EC50) 1.008 0.977
Common baseline at: 0.132
No test statistics were computed.
The occcupancy matrix used in the expected response calculation for the Loewe models can be accessed with rs$occupancy
.
For off-axis data and a fixed dose combination, the Z-score for that dose combination is defined to be the standardized difference between the observed effect and the effect predicted by a generalized Loewe model. If the observed effect differs significantly from the prediction, it might be due to the presence of synergy or antagonism. If multiple observations refer to the same combination of doses, then a mean is taken over these multiple standardized differences.
The following plot illustrates the isobologram of the chosen null model. Coloring and contour lines within the plot should help the user distinguish areas and dose combinations that generate similar response according to the null model. Note that the isobologram is plotted by default on a logarithmically scaled grid of doses.
The plot below illustrates the above considerations in a 3-dimensional setting. In this plot, points refer to the observed effects whereas the surface is the model-predicted response. The surface is colored according to the median Z-scores where blue coloring indicates possible synergistic effects (red coloring would indicate possible antagonism).
For the Highest Single Agent null model to work properly, it is expected that both marginal curves are either decreasing or increasing. Equivalent summary
and plot
methods are also available for this type of null model.
rsh <- fitSurface(data, marginalFit,
null_model = "hsa",
B.CP = 50, statistic = "both", parallel = FALSE)
summary(rsh)
Null model: Highest Single Agent with differing maximal response
Variance assumption used: "equal"
Mean occupancy rate: 0.6732745
Formula: effect ~ fn(h1, h2, b, m1, m2, e1, e2, d1, d2)
Transformations: Yes
Drug A Drug B
Slope 2.132 1.201
Maximal response 0.088 0.091
log10(EC50) 1.008 0.977
Common baseline at: 0.132
Exact meanR test (H0 = no synergy/antagonism):
F(49,117) = 18.178 (p-value < 2e-16)
Evidence for effects in data: Syn
Points with significant deviations from the null:
d1 d2 R p-value call
3 2.0 0.12 -4.274679 0.00173 Syn
10 2.0 0.49 -4.422648 0.00092 Syn
17 2.0 2.00 -4.664814 0.00025 Syn
18 7.8 2.00 -9.654355 <2e-16 Syn
24 2.0 7.80 -4.767067 0.00015 Syn
25 7.8 7.80 -22.667069 <2e-16 Syn
32 7.8 31.00 -8.024914 <2e-16 Syn
40 31.0 130.00 -3.498738 0.03177 Syn
47 31.0 500.00 -3.725266 0.01442 Syn
Overall maxR summary:
Call Syn Ant Total
Syn 9 0 49
Occupancy estimates provided with HSA response surface still rely on the (generalized) Loewe model.
Presence of synergistic or antagonistic effects can be formalized by means of statistical tests. Two types of tests are considered here and are discussed in more details in the methodology vignette as well as the accompanying paper.
meanR
test evaluates how the predicted response surface based on a specified null model differs from the observed one. If the null hypothesis is rejected, this test suggests that at least some dose combinations may exhibit synergistic or antagonistic behaviour. The meanR
test is not designed to pinpoint which combinations produce these effects nor what type of deviating effect is present.
maxR
test allows to evaluate presence of synergistic/antagonistic effects for each dose combination and as such provides a point-by-point classification.
Both of the above test statistics have a well specified null distribution under a set of assumptions, namely normality of Z-scores. If this assumption is not satisfied, distribution of these statistics can be estimated using bootstrap. Normal approximation is significantly faster whereas bootstrapped distribution of critical values is likely to be more accurate in many practical cases.
Here we will use the previously computed CP
covariance matrix to speed up the process.
meanR_N <- fitSurface(data, marginalFit,
statistic = "meanR", CP = rs$CP, B.B = NULL,
parallel = FALSE)
The previous piece of code assumes normal errors. If we drop this assumption, we can use bootstrap methods to resample from the observed errors. Other parameters for bootstrapping, such as additional distribution for errors, wild bootstrapping to account for heteroskedasticity, are also available. See help(fitSurface)
.
meanR_B <- fitSurface(data, marginalFit,
statistic = "meanR", CP = rs$CP, B.B = 20,
parallel = FALSE)
Both tests use the same calculated F-statistic but compare it to different null distributions. In this particular case, both tests lead to identical results.
F-statistic | p-value | |
---|---|---|
Normal errors | 4.855372 | 0 |
Bootstrapped errors | 4.855372 | 0 |
The meanR
statistic can be complemented by the maxR
statistic for each of available dose combinations. We will do this once again by assuming both normal and non-normal errors similar to the computation of the meanR
statistic.
maxR_N <- fitSurface(data, marginalFit,
statistic = "maxR", CP = rs$CP, B.B = NULL,
parallel = FALSE)
maxR_B <- fitSurface(data, marginalFit,
statistic = "maxR", CP = rs$CP, B.B = 20,
parallel = FALSE)
maxR_both <- rbind(summary(maxR_N$maxR)$totals,
summary(maxR_B$maxR)$totals)
Here is the summary of maxR
statistics. It lists the total number of dose combinations listed as synergistic or antagonistic for Experiment 1 given the above calculations.
Call | Syn | Ant | Total | |
---|---|---|---|---|
Normal errors | Syn | 3 | 0 | 49 |
Bootstrapped errors | Syn | 2 | 0 | 49 |
By using the outsidePoints
function, we can obtain a quick summary indicating which dose combinations in Experiment 1 appear to deviate significantly from the null model according to the maxR
statistic.
outPts <- outsidePoints(maxR_B$maxR$Ymean)
kable(outPts, caption = paste0("Non-additive points for Experiment ", i))
d1 | d2 | R | p-value | call | |
---|---|---|---|---|---|
25 | 7.8 | 7.8 | -9.353602 | 0 | Syn |
32 | 7.8 | 31.0 | -4.558810 | 0 | Syn |
Synergistic effects of drug combinations can be depicted in a bi-dimensional contour plot where the x-axis
and y-axis
represent doses of Compound 1
and Compound 2
respectively and each point is colored based on the p-value and sign of the respective maxR
statistic.
contour(maxR_B,
## colorPalette = c("blue", "black", "black", "red"),
main = paste0(" Experiment ", i, " contour plot for maxR"),
scientific = TRUE, digits = 3, cutoff = cutoff)
Previously, we had colored the 3-dimensional predicted response surface plot based on its Z-score, i.e. deviation of the predicted versus the observed effect. We can also easily color it based on the computed maxR
statistic to account for additional statistical variation.
Starting from the package version 1.2.0
the variance can be estimated separatelty for on-axis (monotherapy) and off-axis points using method
argument to fitSurface
. The possible values for method
are:
"equal"
, equal variances assumed (as above, default),"unequal"
, variance is estimated separately for on-axis and off-axis points,"model"
, variance is estimated separately for on-axis points and each off-axis point.Please see the methodology vignette for details. Below we show an example analysis in such case. Note that transformations are not possible if variances are not assumed equal.
## Formula: effect ~ fn(h1, h2, b, m1, m2, e1, e2, d1, d2)
## Transformations: No
##
## Compound 1 Compound 2
## Slope 1.579 1.459
## Maximal response 425.607 789.029
## log10(EC50) 0.663 0.500
##
## Common baseline at: 13200.55
resM <- fitSurface(data, marginalFit, method = "model",
statistic = "both", B.CP = 20, B.B = 20, parallel = FALSE)
resU <- fitSurface(data, marginalFit, method = "unequal",
statistic = "both", B.CP = 20, B.B = 20, parallel = FALSE)
summary(resM)
## Null model: Generalized Loewe Additivity
## Variance assumption used: "model"
## Mean occupancy rate: 0.7663414
##
## Formula: effect ~ fn(h1, h2, b, m1, m2, e1, e2, d1, d2)
## Transformations: No
##
## Compound 1 Compound 2
## Slope 1.579 1.459
## Maximal response 425.607 789.029
## log10(EC50) 0.663 0.500
##
## Common baseline at: 13200.55
##
## Bootstrapped meanR test (H0 = no synergy/antagonism):
## F(49,117) = 2.6687 (p-value < 2e-16)
##
## Evidence for effects in data: Syn
## Points with significant deviations from the null:
## d1 d2 R p-value call
## 25 7.8 7.8 -6.832606 <2e-16 Syn
##
## Overall maxR summary:
## Call Syn Ant Total
## Syn 1 0 49
## Null model: Generalized Loewe Additivity
## Variance assumption used: "unequal"
## Mean occupancy rate: 0.7663414
##
## Formula: effect ~ fn(h1, h2, b, m1, m2, e1, e2, d1, d2)
## Transformations: No
##
## Compound 1 Compound 2
## Slope 1.579 1.459
## Maximal response 425.607 789.029
## log10(EC50) 0.663 0.500
##
## Common baseline at: 13200.55
##
## Bootstrapped meanR test (H0 = no synergy/antagonism):
## F(49,117) = 3.3361 (p-value < 2e-16)
##
## Evidence for effects in data: Syn
## Points with significant deviations from the null:
## d1 d2 R p-value call
## 1 0.12 0.12 -6.450326 <2e-16 Syn
## 2 0.49 0.12 -6.143309 <2e-16 Syn
##
## Overall maxR summary:
## Call Syn Ant Total
## Syn 2 0 49
In order to proceed with multiple experiments, we repeat the same procedure as previously. We collect all the necessary objects for which estimations do not have to be repeated to generate meanR
and maxR
statistics in a simple list.
marginalFits <- list()
datasets <- list()
respSurfaces <- list()
maxR.summary <- list()
for (i in seq_len(nExp)) {
## Select experiment
data <- subsetData(directAntivirals, i)
## Fit joint marginal model
marginalFit <- fitMarginals(data, transforms = transforms,
method = "nlslm")
## Predict response surface based on generalized Loewe model
respSurface <- fitSurface(data, marginalFit,
statistic = "maxR", B.CP = 20,
parallel = FALSE)
datasets[[i]] <- data
marginalFits[[i]] <- marginalFit
respSurfaces[[i]] <- respSurface
maxR.summary[[i]] <- summary(respSurface$maxR)$totals
}
We use the maxR
procedure with a chosen p-value cutoff of 0.95. If maxR
statistic falls outside the 95th percentile of its distribution (either bootstrapped or not), the respective off-axis dose combination is said to deviate significantly from the generalized Loewe model and the algorithm determines whether it deviates in a synergistic or antagonistic way.
Below is the summary of overall calls and number of deviating points for each experiment.
Call | Syn | Ant | Total | |
---|---|---|---|---|
Experiment 1 | Syn | 3 | 0 | 49 |
Experiment 2 | Syn | 4 | 0 | 49 |
Experiment 3 | Syn | 6 | 0 | 49 |
Experiment 4 | Syn | 14 | 1 | 49 |
Previous summarizing and visual analysis can be repeated on each of the newly defined experiments. For example, Experiment 4
indicates a total of 15 combinations that were called synergistic according to the maxR
test.
d1 | d2 | R | p-value | call | |
---|---|---|---|---|---|
4 | 7.80 | 0.00024 | 5.466689 | 0.00002 | Ant |
17 | 2.00 | 0.00400 | -4.490834 | 0.00077 | Syn |
24 | 2.00 | 0.01600 | -6.840881 | 0.00000 | Syn |
25 | 7.80 | 0.01600 | -7.560220 | 0.00000 | Syn |
26 | 31.00 | 0.01600 | -3.671333 | 0.01735 | Syn |
30 | 0.49 | 0.06300 | -3.999672 | 0.00568 | Syn |
31 | 2.00 | 0.06300 | -10.030423 | 0.00000 | Syn |
32 | 7.80 | 0.06300 | -11.994929 | 0.00000 | Syn |
33 | 31.00 | 0.06300 | -4.516293 | 0.00071 | Syn |
38 | 2.00 | 0.25000 | -7.363260 | 0.00000 | Syn |
39 | 7.80 | 0.25000 | -11.501439 | 0.00000 | Syn |
40 | 31.00 | 0.25000 | -4.777185 | 0.00023 | Syn |
44 | 0.49 | 1.00000 | -3.457853 | 0.03562 | Syn |
45 | 2.00 | 1.00000 | -4.292106 | 0.00170 | Syn |
46 | 7.80 | 1.00000 | -7.082216 | 0.00000 | Syn |
Consequently, above table for Experiment 4
can be illustrated in a contour plot.