Suppose you have longitudinal phenotypic measurements and genetic data on the same individuals. The **gwsem** package does not offer a latent growth curve model. Here we demonstrate how to build a latent growth curve model from scratch and perform a genome-wide association test on the slope. Hopefully this demonstration will inspire researchers to incorporate novel structural equation models into genome-wide association studies.

Visit the Onyx website. Download Onyx, and create the pictured model.

In this model, we have four occasions of measurement for the phenotype (`t1`

, `t2`

, `t3`

, and `t4`

). There are two latent variables, `int`

for the intercept and `slope`

for the slope of the trajectory. The loadings from `int`

to the manifests are all fixed to 1. In contrast, the loadings from `slope`

to the manifests increase. In OpenMx, it is possible to use the actual date of measurement instead of time increments of 1 unit. However, the use of non-unit time increments would force us to estimate the model using maximum likelihood, a method much slower than weighted least squares. So we recommend unit time increments. The single nucleotide polymorphism (SNP) data will appear in the `snp`

indicator. It will be convienent to use exactly the name `snp`

because **gwsem** knows which column to use by its name.

We export the model. This model does not entirely suit our purpose. After copying the code to a file, we remove the bits relating to `modelData`

. We will also need to make a few changes to accomodate covariates. To make these changes easier, we gather up all the `mxPath`

statements into a list.

```
manifests<-c("t1","t2","t3","t4","snp")
latents<-c("int","slope")
path <- list(mxPath(from="int",to=c("t1","t2","t3","t4"), free=c(FALSE,FALSE,FALSE,FALSE), value=c(1.0,1.0,1.0,1.0) , arrows=1, label=c("int__t1","int__t2","int__t3","int__t4") ),
mxPath(from="slope",to=c("t1","t2","t3","t4"), free=c(FALSE,FALSE,FALSE,FALSE), value=c(0.0,1.0,2.0,3.0) , arrows=1, label=c("slope__t1","slope__t2","slope__t3","slope__t4") ),
mxPath(from="one",to=c("int","slope"), free=c(TRUE,TRUE), value=c(0.0,0.0) , arrows=1, label=c("const__int","const__slope") ),
mxPath(from="snp",to=c("slope","int"), free=c(TRUE,TRUE), value=c(1.0,0.0) , arrows=1, label=c("snp__slope","snp__int") ),
mxPath(from="int",to=c("int"), free=c(TRUE), value=c(1.0) , arrows=2, label=c("VAR_int") ),
mxPath(from="slope",to=c("slope","int"), free=c(TRUE,TRUE), value=c(1.0,0.1) , arrows=2, label=c("VAR_slope","COV_slope_int") ),
mxPath(from="t1",to=c("t1"), free=c(TRUE), value=c(1.0) , arrows=2, label=c("VAR_err") ),
mxPath(from="t2",to=c("t2"), free=c(TRUE), value=c(1.0) , arrows=2, label=c("VAR_err") ),
mxPath(from="t3",to=c("t3"), free=c(TRUE), value=c(1.0) , arrows=2, label=c("VAR_err") ),
mxPath(from="t4",to=c("t4"), free=c(TRUE), value=c(1.0) , arrows=2, label=c("VAR_err") ),
mxPath(from="snp",to=c("snp"), free=c(TRUE), value=c(1.0) , arrows=2, label=c("snp_res") ),
mxPath(from="one",to=c("t1","t2","t3","t4","snp"), free=F, value=0, arrows=1))
model <- mxModel("lgc",
type="RAM",
manifestVars = manifests,
latentVars = latents,
path)
```

Even if we actually had real data, it might be useful to simulate fake data to develop expertise with our model and better understand what to expect from the real analyses. When we built the model in Onyx, we were careful to set starting values that were also appropriate for data simulation. Due to this preparation, data are easily simulated. We simulate `N=`

500 people because this is the amount of fake SNP data available in the **gwsem** package (in the `extdata`

subdirectory).

```
sim1 <- mxGenerateData(model, N)
head(sim1)
#> t1 t2 t3 t4 snp
#> 1 -2.3680995 -0.85708232 -0.5784724 -1.082179 -0.5610041
#> 2 -0.5870958 1.62991682 3.3460995 2.742590 0.3019831
#> 3 1.3515139 -0.33834852 0.8400219 3.108248 0.7708726
#> 4 0.4889563 0.02940221 1.2267593 1.246545 1.3847643
#> 5 -1.2230067 -0.52827905 -4.0254779 -2.174305 -0.4621659
#> 6 -1.5841539 -4.87968083 -3.1796620 -3.145490 -1.0577216
```

A placeholder `snp`

column needs to exist in the `sim1`

`data.frame`

, and will be replaced with each SNP when we run the GWAS. We add some covariates too.

To add covariates, we need to rebuild the model. To make it easy to rebuild the model, we saved all the `mxPath`

statements in the `path`

list (above). All these paths can be implemented in the new model by including the `path`

list in the model construction. We add `mxFitFunctionWLS`

to select the weighted least squares fit function and add our simulated data.

```
m2 <- mxModel("lgc", type="RAM",
manifestVars = model$manifestVars,
latentVars = c(model$latentVars, paste0('pc', 1:5)),
path,
mxExpectationRAM(M="M"),
mxFitFunctionWLS(allContinuousMethod="marginals"),
mxData(observed=sim1, type="raw", minVariance=0.1, warnNPDacov=FALSE))
```

At this point, `m2`

almost corresponds with the output of `buildOneItem`

or similar model building function. We just need to set up the covariates.

The model is ready to run. If we had ordinal indicators then we could use `setupThresholds`

to perform some additional set up.

We run the model on the first SNP as a sanity check to ensure that our model is working correctly. The `GWAS`

function writes parameter estimates to the given `out.log`

file, but it also returns the model back to **R**. We can call `summary`

on this model to see the results from the first SNP.

```
tdir <- tempdir()
dir <- system.file("extdata", package = "gwsem")
snp1 <- GWAS(m2, file.path(dir,"example.pgen"), file.path(tdir, "out.log"), SNP=1)
#> Running lgc with 29 parameters
#> Done. See '/tmp/RtmpBN6vTu/out.log' for results
summary(snp1)
#> Summary of lgc
#>
#> free parameters:
#> name matrix row col Estimate Std.Error
#> 1 snp__int A int snp -4.576966e-02 0.06621164
#> 2 snp__slope A slope snp -8.091185e-05 0.07189269
#> 3 pc1_to_t1 A t1 pc1 -2.257147e-02 0.06777695
#> 4 pc1_to_t2 A t2 pc1 -9.828099e-02 0.08976147
#> 5 pc1_to_t3 A t3 pc1 -1.562897e-01 0.14428827
#> 6 pc1_to_t4 A t4 pc1 -2.067570e-01 0.20500966
#> 7 pc2_to_t1 A t1 pc2 1.448400e-01 0.07156697
#> 8 pc2_to_t2 A t2 pc2 1.016289e-01 0.09685771
#> 9 pc2_to_t3 A t3 pc2 1.082212e-01 0.15531968
#> 10 pc2_to_t4 A t4 pc2 1.260386e-01 0.21353877
#> 11 pc3_to_t1 A t1 pc3 2.584804e-02 0.07185413
#> 12 pc3_to_t2 A t2 pc3 2.575326e-01 0.09509178
#> 13 pc3_to_t3 A t3 pc3 3.674483e-01 0.14808175
#> 14 pc3_to_t4 A t4 pc3 4.476634e-01 0.20623316
#> 15 pc4_to_t1 A t1 pc4 -1.588379e-02 0.06834869
#> 16 pc4_to_t2 A t2 pc4 -6.097714e-02 0.09759135
#> 17 pc4_to_t3 A t3 pc4 -1.106995e-01 0.14550079
#> 18 pc4_to_t4 A t4 pc4 -1.919035e-01 0.20272268
#> 19 pc5_to_t1 A t1 pc5 -7.561330e-03 0.06610286
#> 20 pc5_to_t2 A t2 pc5 -5.215356e-02 0.09132317
#> 21 pc5_to_t3 A t3 pc5 -5.108507e-02 0.13971823
#> 22 pc5_to_t4 A t4 pc5 -7.411138e-02 0.19819871
#> 23 VAR_err S t1 t1 1.156169e+00 0.06880000
#> 24 snp_res S snp snp 4.185348e-01 0.01363625
#> 25 VAR_int S int int 1.052314e+00 0.14112332
#> 26 COV_slope_int S int slope -1.378718e-01 0.09438743
#> 27 VAR_slope S slope slope 1.755748e+00 0.13740290
#> 28 const__int M 1 int 2.905704e-03 0.06213048
#> 29 const__slope M 1 slope -9.294464e-03 0.06584891
#>
#> Model Statistics:
#> | Parameters | Degrees of Freedom | Fit (r'Wr units)
#> Model: 29 2471 182.1547
#> Saturated: 20 2480 0.0000
#> Independence: 10 2490 NA
#> Number of observations/statistics: 500/2500
#>
#> chi-square: χ² ( df=16 ) = 182.1547, p = 3.114369e-30
#> CFI: NA
#> TLI: NA (also known as NNFI)
#> RMSEA: 0.1441157 [95% CI (0.1220837, 0.1668976)]
#> Prob(RMSEA <= 0.05): 2.220446e-16
#> To get additional fit indices, see help(mxRefModels)
#> timestamp: 2020-06-14 11:54:59
#> Wall clock time: 0.04677057 secs
#> OpenMx version number: 2.17.4
#> Need help? See help(mxSummary)
```

This looks alright so we proceed to run all the SNPs and retrieve the results. The `loadResults`

function calculates Z scores and P values for the `snp__slope`

parameter. We might also examine the `snp__int`

parameter.

```
GWAS(m2, file.path(dir,"example.pgen"), file.path(tdir, "out.log"))
#> Running lgc with 29 parameters
#> Done. See '/tmp/RtmpBN6vTu/out.log' for results
got <- loadResults(file.path(tdir, "out.log"), 'snp__slope')
```

Some of the models failed to fit. If we’re interested to discover why then we can load these separately,

```
susp <- loadSuspicious(file.path(tdir, "out.log"), 'snp__slope')
head(susp)
#> MxComputeLoop1 CHR BP SNP A1 A2 statusCode
#> 1: 7 1 8000 RSID_8 G A <NA>
#> 2: 9 1 10000 RSID_10 G A <NA>
#> 3: 16 1 17000 RSID_17 G A <NA>
#> 4: 18 1 19000 RSID_19 G A <NA>
#> 5: 20 1 21000 RSID_21 G A <NA>
#> 6: 23 1 24000 RSID_24 G A <NA>
#> catch1 snp__slope snp__slopeSE
#> 1: lgc.data: 'snp' has observed variance less than 0.1 1 NaN
#> 2: lgc.data: 'snp' has observed variance less than 0.1 1 NaN
#> 3: lgc.data: 'snp' has observed variance less than 0.1 1 NaN
#> 4: lgc.data: 'snp' has observed variance less than 0.1 1 NaN
#> 5: lgc.data: 'snp' has observed variance less than 0.1 1 NaN
#> 6: lgc.data: 'snp' has observed variance less than 0.1 1 NaN
#> Z P
#> 1: NaN NaN
#> 2: NaN NaN
#> 3: NaN NaN
#> 4: NaN NaN
#> 5: NaN NaN
#> 6: NaN NaN
```

It looks like these SNPs had a variance below the threshold we specified in `mxData`

(`minVariance=0.1`

).

We can present a Manhattan plot consisting of the models that succeeded,