The merlin package allows the fitting of multi outcome models and with any number of nested random effects, outcomes can be modelled jointly, or with shared random effects.

For n record to be included in the model there must be at least one observation per level in the model. For example in a joint survival and longitudinal biomarker model, if the survival time for a patient is avaliable, but all longitudinal biomarker observations are missing, the individual will be excluded from the analysis.

For each individual outcome in the model, a complete case analysis is used. The response, covariates and level indicators are required to fit the model, which may vary by outcome. For example if a model simultaneously models two seperate biomarkers, which are sampled at different time points, data in each record will only be included in the model for the appropriate outcome, potentially resulting in different sample sizes being used for each outcome.

There is a separate variance-covariance matrix for each level of random effects. The structure of the variance-covariance matrix can be varied to allow for correlation between parameters at each level. The possible structures are `identity`

, where all diagonal elements are estimated and constrained to be the same, `diagonal`

where all diagonal elements are estimated uniquely (the default), `unstructured`

where all elements of the variance-covariance matrix are estimated, and `exchangeable`

which assumes a common variance and a common covariance.

Predictions using the fixed-effects only can be found using the `predict()`

function with the argument `type = fixedonly`

. Marginal predictions, where the random effects are integrated out, can be obtained using `type = marginal`

.

In order to illustrate the potential uses of `merlin`

a number of increasingly advanced models have been fitted to the commonly used (in the area of joint modelling of longitudinal and survival data) primary biliary cirrhosis dataset.

This data set needs some re-formatting in order to fit joint models

```
library(merlin)
data("pbc")
pbc[1:11,c("id","years","status","status2","drug","serBilir","prothrombin","year")]
#> id years status status2 drug serBilir prothrombin year
#> 1 1 1.09517 dead 1 D-penicil 14.5 12.2 0.0000000
#> 2 1 1.09517 dead 1 D-penicil 21.3 11.2 0.5256817
#> 3 2 14.15234 alive 0 D-penicil 1.1 10.6 0.0000000
#> 4 2 14.15234 alive 0 D-penicil 0.8 11.0 0.4983025
#> 5 2 14.15234 alive 0 D-penicil 1.0 11.6 0.9993429
#> 6 2 14.15234 alive 0 D-penicil 1.9 10.6 2.1027270
#> 7 2 14.15234 alive 0 D-penicil 2.6 11.3 4.9008871
#> 8 2 14.15234 alive 0 D-penicil 3.6 11.5 5.8892783
#> 9 2 14.15234 alive 0 D-penicil 4.2 11.5 6.8858833
#> 10 2 14.15234 alive 0 D-penicil 3.6 11.5 7.8907020
#> 11 2 14.15234 alive 0 D-penicil 4.6 11.5 8.8325485
```

The event times are given in the `years`

variable and the event indicator in the `status`

variable which can take the values alive, dead or transplanted or the `status2`

variable which is 0 for alive and 1 for dead. Each survival observation should only appear once for each individudal, otherwise each observation will be treated as a seperate event, this allows for recurrent events to be modelled. The survival observations can be reformatted as follows:

```
pbc$stime <- pbc$years
pbc$stime[duplicated(pbc$id)] <- NA
pbc$died <- pbc$status2
pbc$died[duplicated(pbc$id)] <- NA
```

We are also going to work with the log of biomarkers prothrombin index and serum bilirubin throughout, the time of this measurments will be recorded in the variable time. We will also change the treatment variable to be numeric rather than a factor variable.

```
pbc$logb <- log(pbc$serBilir)
pbc$logp <- log(pbc$prothrombin)
pbc$time <- pbc$year
pbc$trt <- as.numeric(pbc$drug) - 1
```

The data now looks like this. Failing to set the data up in this way will lead to errors in the parameter estimates.

```
pbc[1:11,c("id","stime","died","logb","logp","time")]
#> id stime died logb logp time
#> 1 1 1.09517 1 2.67414865 2.501436 0.0000000
#> 2 1 NA NA 3.05870707 2.415914 0.5256817
#> 3 2 14.15234 0 0.09531018 2.360854 0.0000000
#> 4 2 NA NA -0.22314355 2.397895 0.4983025
#> 5 2 NA NA 0.00000000 2.451005 0.9993429
#> 6 2 NA NA 0.64185389 2.360854 2.1027270
#> 7 2 NA NA 0.95551145 2.424803 4.9008871
#> 8 2 NA NA 1.28093385 2.442347 5.8892783
#> 9 2 NA NA 1.43508453 2.442347 6.8858833
#> 10 2 NA NA 1.28093385 2.442347 7.8907020
#> 11 2 NA NA 1.52605630 2.442347 8.8325485
```

We can fit a simple linear model

```
lin.model <- merlin(logb ~ time, family = "gaussian", data = pbc)
lin.model
#> Merlin: mixed-effects model
#> Data: pbc
#> Log likelihood = -2961.414
#>
#> Results:
#> Coef. Std. Error z P>|z| [95% Conf. Interval]
#> time 0.01391608 0.0081279 1.71 0.087 -0.0020143 0.0298465
#> _cons 0.55956049 0.0358059 15.63 0.000 0.4893822 0.6297388
#> log_sd(resid.) 0.10354032 0.0160318 6.46 0.000 0.0721186 0.1349621
summary(lin.model)
#> Mixed effects regression model
#> Log likelihood = -2961.414
#>
#> Coef. Std. Error z P>|z| [95% Conf. Interval]
#> time 0.01391608 0.0081279 1.71 0.087 -0.0020143 0.0298465
#> _cons 0.55956049 0.0358059 15.63 0.000 0.4893822 0.6297388
#> log_sd(resid.) 0.10354032 0.0160318 6.46 0.000 0.0721186 0.1349621
```

We can add flexibility to the model using restricted cubic splines:

```
rcs.model <- merlin(logb ~ rcs(time, df = 3), family = "gaussian", timevar = "time", data = pbc)
summary(rcs.model)
#> Mixed effects regression model
#> Log likelihood = -2960.733
#>
#> Coef. Std. Error z P>|z| [95% Conf. Interval]
#> rcs():1 0.043259677 0.0251475 1.72 0.085 -0.0060285 0.0925479
#> rcs():2 0.028845143 0.0251475 1.15 0.250 -0.0204431 0.0781333
#> rcs():3 -0.009251053 0.0251475 -0.37 0.711 -0.0585392 0.0400371
#> _cons 0.603291290 0.0251475 23.99 0.000 0.5540031 0.6525795
#> log_sd(resid.) 0.103511139 0.0160369 6.45 0.000 0.0720794 0.1349429
```

By default, the restricted cubic splines are orthogonalised (`orthog = TRUE`

). The serum bilirubin observations are clustered within individuals, so we can add a normally-distributed random intercept term named `M1`

. The coefficient of the random-effect term will normally constrained to 1 using the `*1`

notation, unless the random effect is being shared with another outcome model:

```
r.int.model <- merlin(logb ~ rcs(time, df = 3) + M1[id]*1,
family = "gaussian",
levels = "id",
timevar = "time",
data = pbc)
summary(r.int.model)
#> Mixed effects regression model
#> Log likelihood = -1978.61
#>
#> Coef. Std. Error z P>|z| [95% Conf. Interval]
#> rcs():1 0.26636180 0.0136986 19.44 0.000 0.2395130 0.2932106
#> rcs():2 0.05859191 0.0130136 4.50 0.000 0.0330857 0.0840981
#> rcs():3 0.01669602 0.0126950 1.32 0.187 -0.0081857 0.0415778
#> _cons 0.61407632 0.0231746 26.50 0.000 0.5686549 0.6594977
#> log_sd(resid.) -0.61147721 0.0167757 -36.45 0.000 -0.6443570 -0.5785974
#> log_sd(M1) -0.12796233 0.0196162 -6.52 0.000 -0.1664094 -0.0895153
#>
#> Integration method: Non-adaptive Gauss-Hermite quadrature
#> Integration points: 7
#> Convergence: 0
```

We can also add an additional random-slope term (`M2`

) to the model by forming an interaction between the time variable and random-effect using `:`

. We can increase the number of quadrature nodes to improve estimation of the likelihood using the `ip`

option.

```
r.slope.model <- merlin(logb ~ rcs(time, df = 3) + M1[id]*1 + time:M2[id]*1,
family = "gaussian",
timevar = "time",
levels = "id",
ip = 15,
data = pbc)
summary(r.slope.model)
#> Mixed effects regression model
#> Log likelihood = -1718.679
#>
#> Coef. Std. Error z P>|z| [95% Conf. Interval]
#> rcs():1 -0.02441537 0.0165166 -1.48 0.139 -0.0567873 0.0079566
#> rcs():2 -0.02027941 0.0109475 -1.85 0.064 -0.0417361 0.0011773
#> rcs():3 -0.02530224 0.0094557 -2.68 0.007 -0.0438351 -0.0067694
#> _cons 0.23496435 0.0197921 11.87 0.000 0.1961726 0.2737562
#> log_sd(resid.) -1.01703279 0.0176869 -57.50 0.000 -1.0516985 -0.9823671
#> log_sd(M1) -0.27673413 0.0214257 -12.92 0.000 -0.3187277 -0.2347405
#> log_sd(M2) -1.43199974 0.0236142 -60.64 0.000 -1.4782827 -1.3857168
#>
#> Integration method: Non-adaptive Gauss-Hermite quadrature
#> Integration points: 15
#> Convergence: 0
```

If a model has random effects, `merlin`

will fit fixed effects models to obtain starting values and then assume an identity matrix for the variance of the random effects. Initial values can be manually set using the `from`

option.