1. Pedigree-based Evaluations

Robin Wellmann

2017-05-23

Animal breeding relies on the prediction of breeding values of selection candidates and subsequent selection of animals for breeding in a way that restricts the rate of inbreeeding and maintains genetic originality of the breed. The methods of choice are the utilization of pedigree data, of genetic marker data, or a combination of both. This section covers the following evaluations based on pedigree data:

Note that the calculation of breeding values is not included as there already exist a couple of R packages for this purpose. Recommended are the free package MCMCglmm for small and medium sized data sets, and package asreml for large data sets.

Data Preparation

A pedigree is a data frame or data table with the first three columns being the individual ID, the sire ID, and the dam ID. Depending on the intended evaluation, the pedigree may also provide the names of the breeds (column Breed), the years of birth (column Born ), and the sexes (column Sex). A toy example of a pedigree may look like this:

Pedigree <- data.frame(
  ID   = c("Iffes",     "Peter",    "Anna-Lena", "Kevin",    "Horst"),
  dad  = c("Kevin",     "Kevin",     NA,          0,         "Horst"),
  mom  = c("Chantalle", "Angelika", "Chantalle", "",           NA),
  Breed= c("Angler",    "Angler",   "Angler",    "Holstein", "Angler"),
  Born = c(2015,        2016,       2011,       2010,        2015)
  )
Pedigree
##          ID   dad       mom    Breed Born
## 1     Iffes Kevin Chantalle   Angler 2015
## 2     Peter Kevin  Angelika   Angler 2016
## 3 Anna-Lena  <NA> Chantalle   Angler 2011
## 4     Kevin     0           Holstein 2010
## 5     Horst Horst      <NA>   Angler 2015

Individual IDs must be unique and should not contain blank spaces. The above pedigree contains a loop, so it is not a valid pedigree (Horst is an ancestor of himself). Function prePed will prepare the pedigree and turn it into a valid one. The function

library("optiSel")
Pedig <- prePed(Pedigree)
## Pedigree loops were detected. We recommend to correct them manually before
## using prePed(). The parents of the following individuals are set to unknown
## to remove the loops.
##        Sire  Dam
## Horst Horst <NA>

The result is:

Pedig
##               Indiv  Sire       Dam    Sex          Breed Born
## Chantalle Chantalle  <NA>      <NA> female         Angler   NA
## Angelika   Angelika  <NA>      <NA> female         Angler   NA
## Kevin         Kevin  <NA>      <NA>   male       Holstein 2010
## Horst         Horst  <NA>      <NA>   <NA> Pedigree Error 2015
## Anna-Lena Anna-Lena  <NA> Chantalle   <NA>         Angler 2011
## Iffes         Iffes Kevin Chantalle   <NA>         Angler 2015
## Peter         Peter Kevin  Angelika   <NA>         Angler 2016

Plots can be created to show specified individuals and their relatives. In the first step, function subPed is used below to create a small pedigree that includes only the individuals to be plotted, which are "Chantalle" and "Angelika", up to prevGen previous (ancestral) generations, and up to succGen succeeding (descendant) generations. The pedigree was plotted with function pedplot. It is basically a wrapper function for function plot.pedigree from package kinship2, but has additional arguments. Parameter label contains the columns of the pedigree to be used for labeling. By default, symbols of individuals included in vector keep are plotted in black and for individuals from other breeds the symbol is crossed out.

sPed <- subPed(Pedig, keep = c("Chantalle","Angelika"), prevGen = 2, succGen = 1)
pedplot(sPed, label = c("Indiv", "Born", "Breed"), cex = 0.55)

Hereby, SAnna-Lena denotes the unknown sire of Anna-Lena.

Quality Control

The next step is to find out if the completeness of the pedigree is sufficient for the intended evaluation, and to identify individuals with insufficient pedigree information. This will be demonstrated at the example of a pedigree of Hinterwald cattle. Data frame Phen contains selection candidates with breeding values. The breeding values are added below as a new column to the pedigree.

data("PedigWithErrors")
data("Phen")
PedigWithErrors <- merge(PedigWithErrors, Phen[,c("Indiv", "BV")], by="Indiv", all="TRUE")
Pedig <- prePed(PedigWithErrors)

Function completeness can be used to determine the proportion of known ancestors of each individual of the actual breeding population in each ancestral generation:

compl <- completeness(Pedig, keep=Phen$Indiv, by="Indiv")
head(compl)
##               Indiv Generation Completeness
## 403 276000890378247          0          1.0
## 404 276000890378247          1          0.5
## 409 276000891000610          0          1.0
## 410 276000891000610          1          0.5
## 423 276000891471209          0          1.0
## 424 276000891471209          1          0.5

or the mean completeness of the pedigree within groups of individuals:

compl <- completeness(Pedig, keep=Phen$Indiv, by="Sex")
library("ggplot2")
ggplot(compl, aes(x=Generation, y=Completeness, col=Sex)) + geom_line()

The completeness of the pedigree of each individual can be summarized with function summary:

keep <- Phen$Indiv
Summary <- summary(Pedig, keep.only=keep)
head(Summary[Summary$equiGen>3.0, -1])
##                  equiGen fullGen maxGen       PCI  Inbreeding
## 276000812202159 5.789307       1     12 0.7692308 0.027274609
## 276000812496744 6.032959       3     12 0.9677419 0.008903503
## 276000812497289 5.994141       3     12 0.9677419 0.013320923
## 276000812497659 6.885742       3     12 0.9677419 0.010315657
## 276000812749837 4.419678       2     12 0.8333333 0.005828857
## 276000812922523 5.575928       2     12 0.8333333 0.003438234

The following parameters were computed for each individual:

equiGen
Number of equivalent complete generations, defined as the sum of the proportion of known ancestors over all generations traced,
fullGen
Number of fully traced generations,
maxGen
Number of maximum generations traced,
PCI
Index of pedigree completeness, which is the harmonic mean of the pedigree completenesses of the parents (MacCluer et al, 1983),
Inbreeding
Inbreeding coefficient.

The best possibility to characterize completeness of pedigree information by a single value is the number of equivalent complete generations, averaged over all individuals of the actual breeding population which were included in the evaluations.

The relevant parameter to identify individuals with insufficient pedigree information to estimate inbreeding, however, is the PCI. This is because inbreeding can be detected only if both maternal and paternal ancestries are known. The harmonic mean ensures that the less complete paternal pedigree is weighted more heavily, so the PCI equals zero when either parent is unkown. Inbreeding coefficients can be valid despite small PCIs if the most recent founders were indeed unrelated, e.g. because they were from other breeds.

Note that inbreeding coefficients can be computed faster with function pedInbreeding.

Individual Specific Parameters

Inbreeding Coefficients

The inbreeding coefficient of an individual is the probability that two alleles chosen at random from the maternal and paternal haplotypes are identically by descent (IBD). This parameter estimates the extent to which the individual may suffer from inbreeding depression and predicts the homogeneity of its offspring. It can be calculated with

Animal <- pedInbreeding(Pedig)
mean(Animal$Inbr[Animal$Indiv %in% keep])
## [1] 0.01943394

Wheter mating to a sound inbred individual should be favored or avoided depends on whether the breeder wishes offsprig with uniform or heterogeneous breeding values. More important than the inbreeding coefficient of an animal itself, however, is the expected inbreeding coefficient of the offspring, which should be low. The expected inbreeding coefficient of the offspring is the equal to the kinship of the parents.

Kinship

The kinship between two individuals is the probability that two alleles randomly chosen from both individuals are IBD. A matrix containing the kinship between all pairs of individuals can be computed with function pedIBD. It is half the numerator relationship matrix. The R code below computes the kinship between the female with ID 276000812750188 and all male selection candidates that have a breeding value larger than 1.0 and a pedigree with at least 5 equivalent complete generations.

pedKIN <- pedIBD(Pedig, keep.only=keep)
use    <- Pedig$Sex=="male" & Pedig$Indiv %in% keep & summary(Pedig)$equiGen>5 & Pedig$BV>1.0
Males  <- Pedig$Indiv[use]
pedKIN[rownames(pedKIN) %in% Males, "276000812750188", drop=FALSE]
##                 276000812750188
## 276000811902819      0.28955731
## 276000812749676      0.06626139
## 276000812688988      0.16645344
## 276000812689003      0.03807358
## 276000812750190      0.18099209
## 276000813155662      0.17488755
## 276000812771544      0.01719556

The males with lowest kinship should be favoured for mating.

Breed Composition

Mating decisions should not only depend on the breeding value of the male and the kinship between male and female, but also on the genetic contribution of the male from foreign breeds. Many endangered breeds have been graded up with commercial high-yielding breeds. These increasing migrant contributions displace the original genetic background of the endangered breed, decrease the genetic contribution from native ancestors, and reduce the conservation value of the breed.

For computing the breed composition of individuals it should be taken into account that the breed name of founders born recently is in fact unkown even if they have been classified as purebred. Hence, for these individuals, the breed name should be changed to "unknown". Below, the breed name of founders born after 1970 is changed to "unknown" if they had been classified as Hinterwald cattle. Thereafter, function pedBreedComp was used to estimate the contribution of each individual from each foreign breed and from native founders. Finally, a column containing the migrant contribution of each individual (defined as one minus the native contribution) was added to the pedigree.

Pedig <- prePed(PedigWithErrors, thisBreed="Hinterwaelder", lastNative=1970)
cont  <- pedBreedComp(Pedig, thisBreed="Hinterwaelder")
Pedig$MC <- 1-cont$native
head(cont[rev(keep), 2:6])
##                    native    unknown     unbek0  Fleckvieh Vorderwaelder
## 276000812749835 0.6096039 0.09570312 0.03677368 0.11021423    0.14770508
## 276000812749569 0.5945129 0.14160156 0.04071045 0.11599731    0.10717773
## 276000891724277 0.0000000 0.50000000 0.50000000 0.00000000    0.00000000
## 276000891920412 0.5501404 0.12109375 0.03527832 0.10354614    0.18994141
## 276000812496874 0.3734741 0.41601562 0.03430176 0.08294678    0.09326172
## 276000811287745 0.5804138 0.09765625 0.03259277 0.08963013    0.19970703

The columns are ordered so that the most influential foreign breeds come first. It can be seen that the contribution from native founders varies considerably between individuals. Individuals with high genetic contribution from native founders should be favored for mating provided that the kinship between male and female is sufficiently low.

Kinship at Native Alleles

Since animals with low migrant contributions tend to be related, the inbreeding level could increase considerably when introgressed genetic material is removed from the population. This could be avoided by restricting the incease in kinship at native haplotype segments in the population. For pairs of individuals the kinship at native haplotype segments can be estimated as follows:

fD  <- pedIBDatN(Pedig, thisBreed="Hinterwaelder", keep.only=keep)
pedKINatN <- fD$pedIBDandN/fD$pedN
use <- rownames(fD$pedN)[diag(fD$pedN)>0.2]
mean(pedKINatN[use,use])
## [1] 0.07720586

The correlation between the kinship and the kinship at native segments is high for individuals with low migrant contributions (as expected):

sK    <- pedKIN[use, use]
sKatN <- pedKINatN[use, use]
cor(sK[upper.tri(sK)], sKatN[upper.tri(sKatN)], use="complete.obs")
## [1] 0.9505082

But the correlation is smaller if all individuals are considered, which may lead to different selection decisions:

use   <- rownames(fD$pedN)[diag(fD$pedN)>0.01]
sK    <- pedKIN[use, use]
sKatN <- pedKINatN[use, use]
cor(sK[upper.tri(sK)], sKatN[upper.tri(sKatN)], use="complete.obs")
## [1] 0.851401

Population Specific Parameters

Genetic Diversity

The genetic diversity of a population is the probability that two alleles chosen at random from the population are not IBD. It is one minus the average kinship of the individuals. Thus, it can be computed as

1-mean(pedKIN[keep, keep])
## [1] 0.9755853

The diversity of this population is high. A high genetic diversity enables to avoid inbreeding and ensures that polygenic traits have a high additive variance, which is required to achieve genetic gain. However, the diversity of this population seems to be extremely high. This could have two reasons:

  • The diversity is indeed very high because the individuals are in fact crossbred individuals with genetic contributions from many breeds, or

  • The completeness of the pedigrees is insufficient for such an evaluation.

For this data set both is true: The average number of equivalent complete generations is only 6.19, and it is unclear whether this number is low because pedigrees of indivduals from other breeds were cut, or because previously unregistered animals have been registered. Hence, pedigrees of introduced individuals from other breeds should not be cut to reduce this uncertainty.

A parameter which depends not so much on the completeness of the pedigrees is the diversity at native alleles.

Kinship and Diversity at Native Alleles

The kinship at native alleles in the population is the probability that two alleles chosen at random from the population are not IBD, given that both alleles originate from native founders. Since it is defined as a conditional probability, it can be computed as the ratio of two means. The kinship at native alleles is

mean(fD$pedIBDandN)/mean(fD$pedN)
## [1] 0.0776925

The genetic diversity at native alleles is one minus the kinship at native alleles (named conditional gene diversity in Wellmann, Hartwig, and Bennewitz 2012). Thus, it can be calculated as

1- mean(fD$pedIBDandN)/mean(fD$pedN)
## [1] 0.9223075

Note that incomplete pedigrees result in an overestimation of the genetic diversity. This is not the case for the genetic diversity at native alleles because alleles originating from founders born after 1970 were classified to be non-native. Hence, the diversity at alleles originating from these individuals does not affect the estimate.

A high diversity at native alleles is important if a goal of the breeding program is to remove introgressed genetic material from the population. Without maintenance of a high diversity at native alleles, inbreeding coefficients will soon rise to an unreasonable level.

Native Effective Size

The native effective size of a population is defined as the size of an idealized random mating population for which the genetic diversity decreases as fast as the diversity at native alleles decreases in the population under study (Wellmann, Hartwig, and Bennewitz 2012). Thus, the native effective size quantifies how fast the diversity at native alleles decreases. In contrast, the effective size quantifies how fast the genetic diversity decreases.

In a population without historic introgression the native effective size (native Ne) is equal to the effective size (Ne) as estimated by Wellmann and Bennewitz (2011). But in a population with historic introgression it can be argued that the effective size is not useful to describe the history of a population because even a small amount of introgression with unrelated individuals prevents a drop of genetic diversity, so the effecive size would be infinite.

For estimating the native effective size, the diversity at native alleles needs to be estimated at various times, and then the native effective size is estimated from the slope of a regression function.

An estimate of the native effective size is automatically provided when the kinship at native alleles is computed:

fD  <- pedIBDatN(Pedig, thisBreed="Hinterwaelder", keep.only=keep, nGen=6)
## Number of Migrant Founders: 237
## Number of Native  Founders: 150
## Individuals in Pedigree   : 1658
## Native Ne = 49.5 (estimated from 6 previous generations)
## Mean kinship at native alleles:  0.0777

The native Ne is stored as an attribute of the result:

attributes(fD)$nativeNe
## [1] 49.5

Thus, the diversity at native alleles decreases as fast as in an idealized population consisting of 49.5 individuals.

Effective Size

The effective size Ne of a population is the size of an idealized random mating population for which the genetic diversity decreases as fast as in the population under study. It is commonly estimated from the mean rate of increase in coancestry (Cervantes et al. 2011), whereby the increase in coancestry between any pair of individuals \(i\) and \(j\) is computed as \(\Delta c_{ij}=1-\sqrt[\frac{g_i+g_j}{2}]{1-c_{ij}}\), where \(c_{ij}\) is the kinship between \(i\) and \(j\), and \(g_i,g_j\) are the numbers of equivalent complete generations of individuals \(i\) and \(j\). The effective size is then estimated as \(N_e=\frac{1}{2\overline{\Delta c}}\). Thus, the effective size can be estimated as:

id     <- Summary$Indiv[Summary$equiGen>=4 & Summary$Indiv %in% keep]
g      <- Summary[id, "equiGen"]
N      <- length(g)
n      <- (matrix(g, N, N, byrow=TRUE) + matrix(g, N, N, byrow=FALSE))/2
deltaC <- 1 - (1-pedKIN[id,id])^(1/n)
Ne     <- 1/(2*mean(deltaC))
Ne
## [1] 97.95922

However, the above formula assumes that ancestors are missing at random. In populations with historic introgression, pedigrees of individuals from foreign breeds are often cut, so their parents are not missing at random. In fact, even a small amount of introgression suffices to prevent a decrease of genetic diversity, so that the effective size of such a population is \(\infty\).

Change of Native Genome Equivalent and Ne

The native genome equivalent NGE of a population is the minimum number of founders that would be needed to create a population consisting of unrelated individuals that has the same diversity at native alleles as the population under study (Wellmann, Hartwig, and Bennewitz 2012). It is assumed that the individuals were unrelated in base year base, which could be well before pedigree recording had started. Between the base year and the year lastNative in which the last native founder was born, the population had a historical effective size of histNe. This assumption implies that the founders of the pedigree were related due to ancestors whose pedigrees had not been recorded.

The decrease of native genome equivalents is estimated below for the time since lastNative=1970 by assuming that individuals were unrelated in year base=1800 and that the historical effective size was histNe=150 between 1800 and 1970. Since the computation for the full pedigree could take some time, the parameters are estimated below from a subset of the pedigree, which are the individuals included in vector keep. Vector keep contains from each birth cohort 50 randomly sampled Hinterwald cattle.

data("PedigWithErrors")
set.seed(1)
Pedig <- prePed(PedigWithErrors, thisBreed="Hinterwaelder", lastNative=1970)
use   <- Pedig$Breed=="Hinterwaelder" & Pedig$Born %in% (1970:2000)
IDs   <- split(Pedig$Indiv[use], Pedig$Born[use])
keep  <- unlist(mapply(sample, x=IDs, size=pmin(lapply(IDs,length), 50), SIMPLIFY=FALSE))

Kinships <- kinlist(pedIBD   = pedIBD(Pedig, keep.only=keep), 
                    pedIBDatN= pedIBDatN(Pedig, thisBreed="Hinterwaelder", keep.only=keep))
sy <- summary(Kinships, Pedig, tlim=c(1970, 2000), histNe=150, base=1800, df=4)
##    cohort    I pedIBD pedIBDandN pedIBDatN   Ne condGD  NGE
## 1    1970 4.83  0.004      0.003     0.005 50.2  0.894 4.73
## 2    1971 4.96  0.008      0.007     0.015 50.1  0.885 4.35
## 3    1972 5.15  0.008      0.007     0.013 50.2  0.887 4.42
## 4    1973 5.32  0.008      0.006     0.015 50.2  0.885 4.36
## 5    1974 5.66  0.008      0.007     0.015 49.8  0.886 4.37
## 6    1975 5.54  0.008      0.005     0.017 49.2  0.884 4.30
## 7    1976 5.34  0.013      0.010     0.020 48.6  0.880 4.17
## 8    1977 5.43  0.016      0.011     0.026 48.1  0.875 4.02
## 9    1978 5.08  0.009      0.006     0.018 47.5  0.883 4.26
## 10   1979 5.10  0.010      0.007     0.020 46.3  0.881 4.18
## 11   1980 4.95  0.012      0.008     0.022 44.5  0.879 4.12
## 12   1981 5.49  0.013      0.009     0.028 42.5  0.873 3.95
## 13   1982 5.12  0.015      0.011     0.030 40.5  0.872 3.89
## 14   1983 5.19  0.016      0.011     0.030 38.8  0.872 3.90
## 15   1984 5.33  0.017      0.012     0.039 37.6  0.864 3.67
## 16   1985 5.12  0.016      0.012     0.039 37.0  0.864 3.67
## 17   1986 4.95  0.020      0.015     0.042 36.8  0.861 3.60
## 18   1987 4.91  0.016      0.010     0.044 37.1  0.859 3.54
## 19   1988 4.91  0.015      0.010     0.039 38.1  0.864 3.67
## 20   1989 4.66  0.021      0.013     0.056 39.8  0.848 3.29
## 21   1990 4.98  0.022      0.014     0.057 42.5  0.847 3.27
## 22   1991 4.87  0.019      0.013     0.055 45.8  0.849 3.31
## 23   1992 4.70  0.021      0.015     0.060 49.4  0.845 3.22
## 24   1993 4.91  0.017      0.011     0.053 52.7  0.851 3.36
## 25   1994 5.21  0.019      0.014     0.049 55.3  0.854 3.43
## 26   1995 5.23  0.015      0.011     0.051 56.5  0.853 3.40
## 27   1996 5.18  0.013      0.009     0.052 56.6  0.852 3.37
## 28   1997 5.07  0.013      0.009     0.061 56.0  0.844 3.20
## 29   1998 4.86  0.016      0.010     0.068 55.5  0.838 3.08
## 30   1999 4.94  0.012      0.008     0.065 55.3  0.841 3.14
## 31   2000 4.49  0.014      0.010     0.072 55.3  0.834 3.01
ggplot(sy, aes(x=cohort, y=Ne)) + geom_line() + ylim(c(0,100))
ggplot(sy, aes(x=cohort, y=NGE)) + geom_line() + ylim(c(0,7))

Not only the native genome equivalents (column NGE), but also the native effective size (column Ne), the mean kinship (column pedIBD), the mean kinship at native alleles pedIBDatN, the diversity at native alleles corrected for the base year (column condGD), and the generation interval (column I) have been estimated for all birth cohorts.

It is possible that the diversity at native alleles increases for a short period of time, in which case the estimate of the native effective size would be NA. Choose df<4 to get a smooth estimate for the native effective size.

Change of Breed Composition

For monitoring the introgression from other breeds, the contributions of foreign breeds to all birth cohorts can be estimated with function conttac and then plotted, e.g. with function ggplot.

use  <- Pedig$Breed=="Hinterwaelder" & Pedig$Born %in% (1950:1995)
cont <- pedBreedComp(Pedig, thisBreed="Hinterwaelder")
contByYear <- conttac(cont, cohort=Pedig$Born, use=use, mincont = 0.01)
ggplot(contByYear, aes(x=Year, y=Contribution, fill=Breed)) + geom_area(color="black")

References

Cervantes, I., F. Goyache, A. Molina, M. Valera, and J. P. Guitérrez. 2011. “Estimation of Effective Population Size from the Rate of Coancestry in Pedigreed Populations.” Animal Breeding and Genetics 128.

Wellmann, R., and J. Bennewitz. 2011. “Identification and Characterization of Hierarchical Structures in Dog Breeding Schemes, a Novel Method Applied to the Norfolk Terrier.” Journal of Animal Science 89: 3846–58.

Wellmann, R., S. Hartwig, and J. Bennewitz. 2012. “Optimum Contribution Selection for Conserved Populations with Historic Migration; with Application to Rare Cattle Breeds.” Genetics Selection Evolution 44 (34).