This task view collects information on R packages for experimental design
and analysis of data from experiments. With a strong increase in the number of
relevant packages, packages that focus on analysis only and do not make relevant
contributions for design creation are no longer added to this task view.
Please feel free to
suggest enhancements, and please send information on new packages or major
package updates if you think they belong here. Contact details are given on my
Web page
.
Experimental design is applied in many areas, and methods have been tailored
to the needs of various fields. This task view starts out with a section on
the most general packages, continues with specific sections on agricultural and
industrial experimentation, computer experiments, and experimentation in the
clinical trials contexts, and closes with a section on various special
experimental design packages that have been developed for other specific purposes.
Of course, the division into fields is not always clear-cut, and some packages from
the more specialized sections can also be applied in general contexts.
You may also notice that my own experience is mainly from industrial experimentation
(in a broad sense), which may explain a somewhat biased view on things.
Experimental designs for general purposes
There are a few packages for creating and analyzing experimental designs
for general purposes: First of all, the standard (generalized) linear model
functions in the base package stats are of course very important for analyzing
data from designed experiments (especially functions
lm(),
aov()
and the methods and functions for the resulting linear model objects). These are
concisely explained in Kuhnert and Venables (2005, p. 109 ff.); Vikneswaran (2005)
points out specific usages for experimental design (using function
contrasts(),
multiple comparison functions and some convenience functions like
model.tables(),
replications()
and
plot.design()).
Lawson (2014) is a good introductory textbook on experimental design in R, which
gives many example applications.
Lalanne (2012) provides an R companion to the well-known book by Montgomery (2005);
he so far covers approximately the first
ten chapters; he does not include R's design generation facilities, but mainly
discusses the analysis of existing designs.
Package
GAD
handles
general balanced analysis of variance models with fixed and/or random effects
and also nested effects (the latter can only be random); they quote Underwood 1997 for this work.
The package is quite valuable, as many
users have difficulties with using the R packages for handling random or mixed effects.
Package
granova
offers some interesting non-standard graphical representations for results of simply-structured
experiments (one-way and two-way layouts, paired data).
-
Package
AlgDesign
creates full
factorial designs with or without additional quantitative variables, creates mixture
designs (i.e., designs where the levels of factors sum to 1=100%; lattice designs are created only) and creates
D-, A-, or I-optimal designs exactly or approximately. Package
oapackage
allows to generate (not necessarily orthogonal)
optimal fractional factorial 2-level designs with the possibility to prioritize
main effects higher than interactions. Package
rodd
provides T-optimal designs,
also called optimal discriminating designs (Dette, Melas and Shpilev 2013,
Dette, Melas and Guchenko 2014), package
LDOD
locally D-optimal designs
for some nonlinear and generalized linear models, and package
PopED
optimal designs
for nonlinear mixed effect models.
-
Package
conf.design
allows
to create a design with certain interaction effects confounded with blocks (function
conf.design()) and allows to combine existing designs in several ways
(e.g., useful for Taguchi's inner and outer array designs in industrial experimentation).
-
Package
planor
allows
to generate regular fractional factorial designs with fixed and mixed levels
and quite flexible randomization structures. The packages flexibility
comes at the price of a certain complexity and - for larger designs - high computing time.
-
Package
crossdes
creates and analyses cross-over designs of various types (including
latin squares, mutually orthogonal latin squares and Youden squares) that can for example
be used in sensometrics. Package
Crossover
also provides crossover designs;
it offers designs from the literature and algorithmic designs, makes use of the
functionality in
crossdes
and in addition provides a GUI.
-
Package
DoE.base
provides full factorial designs with or without blocking
(function
fac.design) and orthogonal arrays (function
oa.design)
for main effects experiments
(those listed by Kuhfeld 2009 up to 144 runs, plus a few additional ones).
There is also some functionality for assessing the quality of orthogonal arrays,
related to Groemping and Xu (2014), and some analysis functionality with half-normal effects plots in
quite general form (Groemping 2015).
Package
DoE.base
also forms the basis of a suite of related packages (cf. Groemping 2009).
Together with
FrF2
(cf. below) and
DoE.wrapper, it provides the work horse
of the GUI package
RcmdrPlugin.DoE
(beta version; tutorial available in Groemping 2011), which integrates
design of experiments functionality into the R-Commander (package "Rcmdr", Fox 2005)
for the benefit of those R users who cannot or do not want to do command line programming.
The role of package
DoE.wrapper
in that suite is to wrap
functionality from other packages into the input and output structure of the package suite
(so far for response surface designs with package
rsm
(cf. also below),
design of computer experiments with packages
lhs
and
DiceDesign
(cf. also below),
and , and D-optimal designs with package
AlgDesign
(cf. also above).
-
Package
dae
provides various utility functions around experimental design
and R factors, e.g. a randomization routine that can handle various nested structures
(according to Bailey 1981) and functions for combining several factors into one
or dividing one factor into several factors.
Furthermore, the package provides features for post-processing
objects returned by the
aov()
function, e.g. extraction of Yates effects
for 2-level experiments.
-
Package
daewr
accompanies the book
Design and Analysis of Experiments with R
by Lawson (2014) and does not only provide data sets from the book but also some standalone functionality
that is not available elsewhere in R, e.g. definitive screening designs.
-
Package
OPDOE
accompanies the book
Optimal Experimental Design with
R
by Rasch et al. (2011). It has some interesting sample size estimation functionality,
but is almost unusable without the book (the first edition of which I would not recommend buying).
-
Package
blockTools
assigns units to blocks in order to end up with homogeneous sets
of blocks in case of too small block sizes; package
blocksdesign
permits
the creation of nested block structures.
Experimental designs for agricultural and plant breeding experiments
Package
agricolae
offers extensive functionality on experimental design
especially for agricultural and plant breeding experiments, which can also be useful
for other purposes. It supports
planning
of lattice designs, factorial designs,
randomized complete block designs, completely randomized designs,
(Graeco-)Latin square designs, balanced incomplete block designs and alpha designs.
There are also various
analysis
facilities for experimental data, e.g. treatment
comparison procedures and several non-parametric tests, but also some quite specialized
possibilities for specific types of experiments. The package
agridat
offers a large repository of useful agricultural data sets.
Experimental designs for industrial experiments
Some further packages especially handle designs for industrial experiments
that are often highly fractionated, intentionally confounded and have few extra degrees
of freedom for error.
Fractional factorial 2-level designs are particularly important in industrial
experimentation.
-
Package
FrF2
(Groemping 2014) is the most comprehensive R package for
their creation. It generates regular Fractional Factorial
designs for factors with 2 levels as well as Plackett-Burman type screening designs.
Regular fractional factorials default to maximum resolution minimum aberration designs
and can be customized in various ways, supported by an
incorporated catalogue of designs (including the designs catalogued by Chen, Sun and Wu 1993,
and further larger designs catalogued in Block and Mee 2005 and Xu 2009;
the additional package
FrF2.catlg128
provides a very large complete catalogue
for resolution IV 128 run designs with up to 23 factors for special purposes).
Analysis-wise,
FrF2
provides simple graphical analysis tools (normal and half-normal effects plots
(modified from
BsMD, cf. below), main effects
plots and interaction plot matrices similar to those in Minitab software, and a cube
plot for the combinations of three factors). It can also show the alias structure
for regular fractional factorials of 2-level factors, regardless whether they have been
created with the package or not.
Fractional factorial 2-level plans can also be created by other R packages,
namely
BHH2
and
qualityTools
(but do not use function pbDesign from
version 1.54 of that package!), or with a little bit more complication
by packages
conf.design,
planor
or
AlgDesign.
Package
oapackage
allows to generate (not necessarily orthogonal)
optimal fractional factorial 2-level designs with the possibility to prioritize
main effects higher than interactions. Package
ALTopt
provides optimal designs
for accelerated life testing.
-
Package
BHH2
accompanies the 2nd edition of the book by Box, Hunter and Hunter
and provides various of its data sets. It can generate full and fractional factorial
two-level-designs from a number of factors and a list of defining relations
(function
ffDesMatrix(), less comfortable than package FrF2).
It also provides several functions for analyzing data from 2-level factorial
experiments: The function anovaPlot assesses effect sizes relative to residuals, and
the function
lambdaPlot()
assesses the effect of Box-Cox transformations on
statistical significance of effects.
-
BsMD
provides Bayesian charts as
proposed by Box and Meyer (1986) as well as effects plots (normal, half-normal and
Lenth) for assessing which effects are active in a fractional factorial experiment
with 2-level factors.
Apart from tools for planning and analysing factorial designs, R also offers support for
response surface optimization for quantitative factors (cf. e.g. Myers and Montgomery 1995):
-
Package
rsm
supports sequential
optimization with first order and second order response surface models (central composite
or Box-Behnken designs), offering
optimization approaches like steepest ascent and visualization of the response
function for linear model objects. Also, coding for response surface investigations is
facilitated.
-
Package
DoE.wrapper
enhances design creation from package
rsm
with the possibilities of automatically choosing the cube portion of central
composite designs and of augmenting
an existing (fractional) factorial 2-level design with a star portion.
-
Package
Vdgraph
implements a variance dispersion graph (Vining 1993) for response
surface designs created by package
rsm. Package
VdgRsm
provides similar functionality with more variety.
-
Package
qualityTools
can also create central composite designs
and can visualize response surfaces.
-
Package
EngrExpt
provides a collection of data sets from the book
Introductory Statistics for Engineering Experimentation
by Nelson, Coffin and Copeland (2003).
In some industries, mixtures of ingredients are important; these require special designs,
because the quantitative factors have a fixed total.
Mixture designs are handled by packages
AlgDesign
(function
gen.mixture,
lattice designs),
qualityTools
(function
mixDesign,
lattice designs and simplex centroid designs), and
mixexp
(several small functions for simplex centroid,
simplex lattice and extreme vertices designs as well as for plotting).
Occasionally, supersaturated designs can be useful.
The two small packages
mkssd
and
mxkssd
provide fixed level and mixed level
k-circulant supersaturated designs.
Experimental designs for computer experiments
Computer experiments with quantitative factors require special types of
experimental designs: it is often possible to include many different
levels of the factors, and replication will usually not be beneficial. Also, the
experimental region is often too large to assume that a linear or quadratic model adequately
represents the phenomenon under investigation. Consequently, it is desirable to fill
the experimental space with points as well as possible (space-filling designs) in such
a way that each run provides additional information even if some factors turn out to be
irrelevant.
The
lhs
package provides latin hypercube designs for this purpose.
Furthermore, the package provides ways to analyse such computer experiments with
emphasis on what follow-up experiments to conduct. Another package with similar orientation
is the
DiceDesign
package, which adds further ways to construct space-filling
designs and some measures to assess the quality of designs for computer experiments. The
package
DiceKriging
provides the kriging methodology which is often used for
creating meta models from computer experiments, the package
DiceEval
creates
and evaluates meta models (among others Kriging ones), and the package
DiceView
provides facilities for viewing sections of multidimensional meta models.
Package
MaxPro
provides maximum projection designs as introduced by
Joseph, Gul and Ba(2015). Package
simrel
allows creation of designs for
computer experiments according to the Multi-level binary replacement (MBR) strategy
by Martens et al. (2010).
Package
tgp
is another package dedicated to planning and analysing
computer experiments. Here, emphasis is on Bayesian methods.
The package can for example be used with various kinds of (surrogate) models for
sequential optimization, e.g. with an expected improvement criterion for optimizing a noisy
blackbox target function. Packages
plgp
and
dynaTree
enhance the
functionality offered by
tgp
with particle learning facilities and learning for
dynamic regression trees.
Package
BatchExperiments
is also designed for computer
experiments, in this case specifically for experiments with algorithms to be run
under different scenarios. The package is described in a technical report by
Bischl et al. (2012).
Experimental designs for clinical trials
This task view only covers specific design of experiments packages; there may be some
grey areas. Please, also consult the
ClinicalTrials
task view.
-
Package
experiment
contains tools for clinical experiments,
e.g., a randomization tool, and it provides a few special analysis options for clinical
trials.
-
Package
gsDesign
implements group sequential designs,
package
OptGS
near-optimal balacned group sequential designs.
-
Package
gsbDesign
evaluates operating characteristics for group sequential Bayesian designs.
-
Package
asd
implements adaptive sequential designs.
-
Package
OptInterim
is for two- and three-stage designs for longterm binary endpoints.
-
Package
bcrm
offers Bayesian CRM designs.
-
Package
MAMS
offers designs for multi-arm multi stage studies.
-
Package
TEQR
provides toxicity equivalence range designs (Blanchard and Longmate 2010) for phase I clinical trials,
package
pipe.design
so-called
product of independent beta probabilities dose escalation
(PIPE)
designs for phase I.
-
Package
sp23design
claims to offer seamless integration of phase II to III.
-
The
DoseFinding
package provides functions for the design and analysis
of dose-finding experiments (for example pharmaceutical Phase II clinical trials);
it combines the facilities of the "MCPMod" package (maintenance discontinued;
described in Bornkamp, Pinheiro and Bretz 2009) with a special type of optimal designs for
dose finding situations (MED-optimal designs, or D-optimal designs, or a mixture of both;
cf., Dette et al. 2008).
Experimental designs for special purposes
Various further packages handle special situations in experimental design:
-
Package
desirability
provides ways to combine several target criteria into a desirability function in order to simplify
multi-criteria analysis; desirabilities are also offered as part of package
qualityTools.
-
osDesign
designs studies nested in observational studies,
-
qtlDesign
is for quantitative trait locus designs,
-
toxtestD
creates optimal designs for binary toxicity tests,
-
designGG
creates optimal designs for genetical genomics experiments (see Li et al. 2009),
-
geospt
allows to optimize spatial networks of sampling points (see e.g. Santacruz, Rubiano and Melo 2014).
-
Package
SensoMineR
contains special designs for
sensometric studies, e.g., for the triangle test.
-
Package
support.CEs
provides tools for creating stated choice designs
for market research investigations.
Key references for packages in this task view
-
Atkinson, A.C. and Donev, A.N. (1992).
Optimum Experimental Designs.
Oxford: Clarendon Press.
-
Bailey, R.A. (1981). A unified approach to design of experiments.
Journal of the Royal Statistical Society, Series A
144
214-223.
-
Ball, R.D. (2005). Experimental Designs for Reliable Detection of Linkage
Disequilibrium in Unstructured Random Population Association Studies.
Genetics
170
859-873.
-
Bischl, B., Lang, M., Mersmann, O., Rahnenfuehrer, J. and Weihs, C. (2012).
Computing on high performance clusters with R: Packages BatchJobs and
BatchExperiments
.
Technical Report 1/2012
, TU Dortmund, Germany.
-
Blanchard, M.S. and Longmate, J.A. (2010).
Toxicity equivalence range design (TEQR): A practical Phase I design.
Contemporary Clinical Trials
doi:10.1016/j.cct.2010.09.011.
-
Block, R. and Mee, R. (2005). Resolution IV Designs with 128 Runs.
Journal of Quality Technology
37
282-293.
-
Bornkamp B., Pinheiro J. C., and Bretz, F. (2009).
MCPMod: An R Package for the Design and Analysis of Dose-Finding Studies
.
Journal of Statistical Software
29
(7) 1-23.
-
Box G. E. P, Hunter, W. C. and Hunter, J. S. (2005).
Statistics for Experimenters
(2nd edition). New York: Wiley.
-
Box, G. E. P and R. D. Meyer (1986). An Analysis for Unreplicated Fractional
Factorials.
Technometrics
28
11-18.
-
Box, G. E. P and R. D. Meyer (1993). Finding the Active Factors in Fractionated Screening
Experiments.
Journal of Quality Technology
25
94-105.
-
Chasalow, S., Brand, R. (1995). Generation of Simplex Lattice Points.
Journal of the Royal Statistical Society, Series C
44
534-545.
-
Chen, J., Sun, D.X. and Wu, C.F.J. (1993). A catalogue of 2-level and 3-level orthogonal arrays.
International Statistical Review
61
131-145.
-
Collings, B. J. (1989). Quick Confounding.
Technometrics
31
107-110.
-
Cornell, J. (2002).
Experiments with Mixtures
. Third Edition. Wiley.
-
Daniel, C. (1959). Use of Half Normal Plots in Interpreting Two Level Experiments.
Technometrics
1
311-340.
-
Derringer, G. and Suich, R. (1980). Simultaneous Optimization of Several Response Variables.
Journal of Quality Technology
12
214-219.
-
Dette, H., Bretz, F., Pepelyshev, A. and Pinheiro, J. C. (2008).
Optimal Designs for Dose Finding Studies.
Journal of the American Statisical Association
103
1225-1237.
-
Dette, H., Melas, V.B. and Shpilev, P. (2013). Robust T-optimal discriminating designs.
The Annals of Statistics
41
1693-1715.
-
Dette H., Melas V.B. and Guchenko R. (2014). Bayesian T-optimal discriminating designs.
ArXiv link
.
-
Federov, V.V. (1972).
Theory of Optimal Experiments.
Academic Press, New York.
-
Fox, J. (2005).
The R Commander:
A Basic-Statistics Graphical User Interface to R
.
Journal of Statistical Software
14
(9) 1-42.
-
Gramacy, R.B. (2007).
tgp: An R Package
for Bayesian Nonstationary, Semiparametric Nonlinear Regression and Design by Treed Gaussian Process Models
.
Journal of Statistical Software
19
(9) 1-46.
-
Groemping, U. (2009).
Design of Experiments in R
. Presentation at UseR! 2009 in Rennes, France.
-
Groemping, U. (2011).
Tutorial for designing experiments using the R package RcmdrPlugin.DoE
.
Reports in Mathematics, Physics and Chemistry
,
Department II, Beuth University of Applied Sciences Berlin.
-
Groemping, U. (2014). R Package FrF2 for Creating and Analysing Fractional Factorial 2-Level Designs.
Journal of Statistical Software
56
(1) 1-56.
-
Groemping, U. and Xu, H. (2014). Generalized resolution for orthogonal arrays.
The Annals of Statistics
42
918-939.
-
Groemping, U. (2015). Augmented Half Normal Effects Plots in the Presence
of a Few Error Degrees of Freedom.
Quality and Reliability Engineering International
online early
DOI: 10.1002/qre.1842.
-
Hoaglin D., Mosteller F. and Tukey J. (eds., 1991).
Fundamentals of Exploratory Analysis of Variance
.
Wiley, New York.
-
Jones, B. and Kenward, M.G. (1989).
Design and Analysis of Cross-Over Trials
. Chapman and
Hall, London.
-
Johnson, M.E., Moore L.M. and Ylvisaker D. (1990). Minimax and maximin distance designs.
Journal of Statistical Planning and Inference
26
131-148.
-
Joseph, V. R., Gul, E., and Ba, S. (2015). Maximum Projection Designs for
Computer Experiments.
Biometrika
102
371-380.
-
Kuhfeld, W. (2009). Orthogonal arrays. Website courtesy of SAS Institute Inc., accessed August 4th 2010.
URL
http://support.sas.com/techsup/technote/ts723.html
.
-
Kuhnert, P. and Venables, B. (2005)
An Introduction to R: Software for Statistical
Modelling & Computing
. URL
http://CRAN.R-project.org/doc/contrib/Kuhnert+Venables-R_Course_Notes.zip
.
(PDF document (about 360 pages) of lecture notes in combination with the data sets and R scripts)
-
Kunert, J. (1998). Sensory Experiments as Crossover Studies.
Food Quality and Preference
9
243-253.
-
Lalanne, C. (2012). R Companion to Montgomerys Design and Analysis of Experiments.
Manuscript, downloadable at URL
http://www.aliquote.org/articles/tech/dae/dae.pdf
.
(The file accompanies the book by Montgomery 2005 (cf. below).)
-
Lawson, J. (2014).
Design and Analysis of Experiments with R.
Chapman and Hall/CRC, Boca Raton.
-
Lenth, R.V. (1989). Quick and Easy Analysis of Unreplicated Factorials.
Technometrics
31
469-473.
-
Lenth, R.V. (2009).
Response-Surface Methods in R, Using rsm
.
Journal of Statistical Software
32
(7) 1-17.
-
Y. Li, M. Swertz, G. Vera, J. Fu, R. Breitling, and R.C. Jansen.
designGG:
An R-package and Web tool for the optimal design of genetical genomics experiments.
BMC Bioinformatics
10
:188
-
Martens, H., Mage, I., Tondel, K., Isaeva, J., Hoy, M. and Saebo, S. (2010).
Multi-level binary replacement (MBR) design for computer experiments in high-dimensional
nonlinear systems,
J. Chemom.
24
748-756.
-
Mee, R. (2009).
A Comprehensive Guide to Factorial Two-Level Experimentation.
Springer, New York.
-
Montgomery, D. C. (2005, 6th ed.).
Design and Analysis of Experiments.
Wiley, New York.
-
Myers, R. H. and Montgomery, D. C. (1995).
Response Surface Methodology: Process and Product
Optimization Using Designed Experiments.
Wiley, New York.
-
Nelson, P.R., Coffin, M. and Copeland, K.A.F. (2003).
Introductory Statistics for Engineering
Experimentation.
Academic Press, San Diego.
-
Plackett, R.L. and Burman, J.P. (1946). The design of optimum multifactorial experiments.
Biometrika
33
305-325.
-
Rasch, D., Pilz, J., Verdooren, L.R. and Gebhardt, A. (2011).
Optimal Experimental
Design with R.
Chapman and Hall/CRC. (caution, does not live up to its title!)
-
Rosenbaum, P. (1989). Exploratory Plots for Paired Data.
The American Statistician
43
108-109.
-
Sacks, J., Welch, W.J., Mitchell, T.J. and Wynn, H.P. (1989). Design and analysis of computer experiments.
Statistical Science
4
409-435.
-
Santacruz, A., Rubiano, Y., Melo, C., 2014. Evolutionary optimization of spatial sampling networks
designed for the monitoring of soil carbon. In: Hartemink, A., McSweeney, K. (Eds.).
Soil Carbon.
Series: Progress in Soil Science. (pp. 77-84). Springer, New York.
-
Santner T.J., Williams B.J. and Notz W.I. (2003).
The Design and Analysis of Computer Experiments.
Springer, New York.
-
Sen S, Satagopan JM and Churchill GA (2005). Quantitative Trait Locus Study Design from an Information
Perspective.
Genetics
170
447-464.
-
Stein, M. (1987). Large Sample Properties of Simulations Using Latin Hypercube Sampling.
Technometrics
29
143-151.
-
Stocki, R. (2005). A Method to Improve Design Reliability Using Optimal Latin Hypercube Sampling.
Computer Assisted Mechanics and Engineering Sciences
12
87-105.
-
Underwood, A.J. (1997).
Experiments in Ecology: Their Logical Design and Interpretation Using Analysis of Variance.
Cambridge University Press, Cambridge.
-
Vikneswaran (2005).
An R companion to "Experimental Design".
URL
http://CRAN.R-project.org/doc/contrib/Vikneswaran-ED_companion.pdf
.
(The file accompanies the book "Experimental Design with Applications in Management, Engineering
and the Sciences" by Berger and Maurer, 2002.)
-
Vining, G. (1993). A Computer Program for Generating Variance Dispersion Graphs.
Journal of Quality Technology
25
45-58. Corrigendum in the same volume, pp. 333-335.
-
Xu, H. (2009). Algorithmic Construction of Efficient Fractional Factorial Designs With Large Run Sizes.
Technometrics
51
262-277.