The LUCIDus R package is an integrative tool to obtain a joint estimation of latent or unknown clusters/subgroups with multi-omics data and phenotypic traits. This package is an implementation for the novel statistical method proposed in the research paper “A Latent Unknown Clustering Integrating Multi-Omics Data (LUCID) with Phenotypic Traits” published by the *Bioinformatics*.

Cheng Peng, Jun Wang, Isaac Asante, Stan Louie, Ran Jin, Lida Chatzi, Graham Casey, Duncan C Thomas, David V Conti, A Latent Unknown Clustering Integrating Multi-Omics Data (LUCID) with Phenotypic Traits, Bioinformatics, , btz667, https://doi.org/10.1093/bioinformatics/btz667

You can install the released version of LUCIDus from CRAN directly with:

Or, it can be installed from GitHub using the following codes:

Three functions, including `est_lucid()`

, `boot_lucid()`

, and `tune_lucid()`

, are currently available for model fitting and selection. The model outputs can be summarized and visualized using `summary_lucid()`

and `plot_lucid()`

respectively. Predictions could be made with `pred_lucid()`

.

`est_lucid()`

Estimating latent clusters with multi-omics data, missing values in biomarker data are allowed, and information in the outcome of interest can be integrated

For a testing dataset with 10 genetic features (5 causal) and 4 biomarkers (2 causal)

`summary_lucid()`

`plot_lucid()`

`boot_lucid()`

Bootstrap method to achieve SEs for LUCID parameter estimates

```
set.seed(10)
boot_lucid(G = G1, CoG = CoG, Z = Z1, Y = Y1, CoY = CoY, useY = TRUE, family = "binary", K = 2, R=500)
```

`tune_lucid()`

Grid search for tuning parameters using parallel computing

```
# Better be run on a server or HPC
set.seed(10)
GridSearch <- tune_lucid(G=G1, Z=Z1, Y=Y1, K=2, Family="binary", USEY = TRUE,
LRho_g = 0.008, URho_g = 0.012, NoRho_g = 3,
LRho_z_invcov = 0.04, URho_z_invcov = 0.06, NoRho_z_invcov = 3,
LRho_z_covmu = 90, URho_z_covmu = 110, NoRho_z_covmu = 2)
GridSearch$Results
GridSearch$Optimal
```

Run LUCID with best tuning parameters and select informative features

```
set.seed(10)
IntClusFit <- est_lucid(G=G1,Z=Z1,Y=Y1,K=2,family="binary",Pred=TRUE,
tunepar = def_tune(Select_G=TRUE,Select_Z=TRUE,
Rho_G=0.01,Rho_Z_InvCov=0.06,Rho_Z_CovMu=90))
# Identify selected features
summary_lucid(IntClusFit)$No0G; summary_lucid(IntClusFit)$No0Z
colnames(G1)[summary_lucid(IntClusFit)$select_G]; colnames(Z1)[summary_lucid(IntClusFit)$select_Z]
# Select the features
if(!all(summary_lucid(IntClusFit)$select_G==FALSE)){
G_select <- G1[,summary_lucid(IntClusFit)$select_G]
}
if(!all(summary_lucid(IntClusFit)$select_Z==FALSE)){
Z_select <- Z1[,summary_lucid(IntClusFit)$select_Z]
}
```

```
IntClusCoFit <- est_lucid(G=G1,CoG=CoG,Z=Z1,Y=Y1,K=2,family="binary",Pred=TRUE,
initial=def_initial(), itr_tol=def_tol(),
tunepar = def_tune(Select_G=TRUE,Select_Z=TRUE,Rho_G=0.02,Rho_Z_InvCov=0.1,Rho_Z_CovMu=93))
summary_lucid(IntClusCoFit)
```

For more details, see documentations for each function in the R package.

The current version is 1.0.0.

For the versions available, see the Release on this repository.

**Cheng Peng**

This project is licensed under the GPL-2 License.

- David V. Conti, Ph.D.
- Zhao Yang, Ph.D.
- USC IMAGE P1 Group