# An Introduction to BSL

#### 2018-02-08

BreedingSchemeLanguage is a package that simulates plant breeding using phenotypic or genomic selection. All package functions simulate recognizable breeding tasks that can be assembled into breeding schemes. For example, users can simulate phenotyping in multiple (possibly correlated) locations over years, calculate the cost of the simulated breeding scheme, and compare multiple breeding strategies. Reasonable defaults are given for each function, but many aspects are customizable. Functions are context dependent, e.g, by default, the phenotype() function will generate phenotypes for the last population created.

## Start using BreedingSchemeLanguage

Load the BreedingSchemeLanguage. Find a temporary directory to work with for this vignette.

library(BreedingSchemeLanguage)
## Loading required package: snowfall
## Loading required package: snow
## Loading required package: Rcpp
simTempDir <- tempdir()

## Define simulation settings

Define the genetic architecture of the plant species and other whole simulation settings. Because generating historical haplotypes is rather slow, save them so they can be reused.

simEnv <- defineSpecies(nSim=3, saveDataFileName=paste(simTempDir, "simSpecies", sep="/"))
• Here, we plan to conduct three simulations using default settings (i.e., genome with seven chromosomes of 150 cM length, historical effective population size of 100, trait controlled by additive QTLs, etc.).

• The BSL keeps simulation objects in an R environment. If users specify the default simEnv the environment object does not need to be passed to the functions.

Specify environmental variances and breeding costs

locCor <- matrix(c(1, 0.6, 0.3,
0.6, 1, 0.8,
0.3, 0.8, 1)
, 3)
defineVariances(locCorrelations=locCor, plotTypeErrVars=errVars)
defineCosts(phenoCost=plotCosts)
• The genetic variance is fixed to 1 in the founder population.
• We do not specify a genotype by year variance, so the default of 1 will be used.
• We assume three correlated locations, with correlation coefficients of 0.6, 0.3 and 0.8.
• Preliminary and Advanced trials have error variances of 4 and 1, respectively, and per plot costs of 2 and 5, respectively. Costs are in arbitrary units.

Initialize the breeding population from historical haplotypes.

initializePopulation()
• The default of 100 founders are created here.

## Simulate breeding schemes

Estimate the per se or own performance of individuals in the initial breeding population and select based on these values.

phenotype(locations=1:3, years=1:2, plotType="Preliminary")
predictValue()
select()
• Phenotype in all locations over two years in Preliminary trials.
• In the absence of genotype data or specifying that pedigree should be used, the genotypic are estimated as independently and identically distributed from the phenotypic data.
• No arguements are passed to select() so the default of using the values just estimated (in the predictValue() function) and selecting 40 individuals applies. The function cross() creates the next generation of 100 individuals (also by default).

Cross to create progeny, phenotype them then select using pedigree information

cross()
predictValue(sharingInfo="pedigree")
select()
• Phenotype in location 3 only, in the third year, in an advance trial.
• Use pedigree information to calculate a relationship matrix that enables information sharing among observations.
• All phenotypic information obtained during all years is used in the model.

Genomic selection among progeny produced by selfing

selfFertilize(nProgeny=120)
genotype()
predictValue(sharingInfo="markers", locations=3)
select()
cross()
• Self-fertilize selected parents to create 120 progeny (three selfed progeny per parent)
• By default, genotype() causes marker data to be available for all individuals of the breeding scheme.
• Specify location 3 to only use trials from that location for training.
• The predictValue() function uses GBLUP when “markers” is given as the sharingInfo parameter.

## Plot the results

cycleMeans <- plotData(addDataFileName=paste(simTempDir, "simPlot", sep="/"))

• The thick line shows the mean value of the three simulations, while each thin line shows the selection responses in each simulation.

## Start a new simulation from the same historical haplotypes

First, it is best to delete the previous simulation environment, then load the data using the name given at the beginning of the vignette. Simple simulation using defaults.

if (exists("simEnv")){
rm(list=names(simEnv), envir=simEnv)
rm(simEnv)
}
initializePopulation()
phenotype()
select()
cross()
phenotype()
select()
cross()
phenotype()
select()
cross()
cycleMeans <- plotData(add=TRUE, addDataFileName=paste(simTempDir, "simPlot", sep="/"))

• The results of the second simple default scheme are shown with the dashed lines.

## Clean up after the vignette

fr <- file.remove(paste(simTempDir, "simSpecies.RData", sep="/"))
fr <- file.remove(paste(simTempDir, "simPlot.rds", sep="/"))