A manual to show the R package quickReg
.
The quickReg
package concentrates on a set of functions to display and pry a dataset. More precisely, the package can display statistical description for a dataset, build univariate regression models for lm, glm and cox regression based on specified variables. More importantly, the package provides several seamless functions to display these regressions. Several examples are used to explain the idea.
The example data is a hypothetical dataset extracting a subset from package PredictABEL. It has no practical implications and only be used to demostrate the main idea of the package.
# If you haven't install the package, you can download it from cran
# install.packages("quickReg")
library(quickReg)
# Load the dataset
data(diabetes)
# Show the first 6 rows of the data
head(diabetes)
## sex age smoking education diabetes BMI systolic diastolic CFHrs1061170 LOCrs10490924 CFHrs1410996 C2rs9332739 CFBrs641153 CFHrs2230199
## 1 1 44 1 0 1 40 129 91 1 2 2 1 1 0
## 2 0 53 0 0 0 29 137 98 2 1 1 1 0 0
## 3 1 46 1 0 0 29 136 93 1 1 2 1 1 1
## 4 1 63 0 0 0 29 176 119 1 0 1 1 0 0
## 5 0 60 NA 0 1 30 148 107 1 2 1 1 0 2
## 6 0 52 0 1 1 29 133 91 1 1 1 1 1 0
We can use the function diaplay to show statistical descriptions of the data.
show_data<-display(diabetes)
# We can show the results with indices or just the name of variables
show_data[1:2]
## $sex
## $sex$split_line
## [1] "================================================================================"
##
## $sex$table
## 0 1
## count 572.000 428.000
## propotion 0.572 0.428
##
##
## $age
## $age$split_line
## [1] "================================================================================"
##
## $age$summary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 33.00 48.75 58.00 58.98 67.00 94.00
##
## $age$describe
## n mean sd median trimmed mad min max range skew kurtosis se
## 1000 58.98 13.27 58 58.32 13.34 33 94 61 0.38 -0.38 0.42
##
## $age$normality
## [1] "Shapiro-Wilk normality test, statistic = 0.97864, p-value = 5.976e-11"
show_data$BMI
## $split_line
## [1] "================================================================================"
##
## $summary
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 22.0 28.0 31.0 31.1 34.0 43.0 6
##
## $describe
## n mean sd median trimmed mad min max range skew kurtosis se
## 994 31.1 3.79 31 30.98 4.45 22 43 21 0.32 -0.18 0.12
##
## $normality
## [1] "Shapiro-Wilk normality test, statistic = 0.9852, p-value = 1.724e-08"
# Apply univariate regression models
reg_glm<-reg(data = diabetes, y = 5, factor = c(1, 3, 4), model = 'glm')
# reg_glm have two componets, the regression models in detail and a concentrated data frame
# We can show the detail information with: reg_glm$detail, detail(reg_glm)
reg_glm$detail$BMI
## $split_line
## [1] "================================================================================"
##
## $summary
##
## Call:
## glm(formula = y ~ x_one, family = binomial(link = "logit"))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.6991 -0.6617 -0.6435 -0.6142 1.9037
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.81164 0.67066 -1.210 0.226
## x_one -0.02055 0.02153 -0.955 0.340
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 967.02 on 993 degrees of freedom
## Residual deviance: 966.10 on 992 degrees of freedom
## (6 observations deleted due to missingness)
## AIC: 970.1
##
## Number of Fisher Scoring iterations: 4
##
##
## $`OR(95%CI)`
## 2.5 % 97.5 %
## (Intercept) 0.4441276 0.1193405 1.658213
## x_one 0.9796557 0.9388467 1.021610
# To show the concentrated data frame: reg_glm$dataframe, dataframe(reg_glm)
dataframe(reg_glm)
## term estimate std.error statistic p.value OR OR.low OR.high
## 2 sex1 -0.0995619364 0.163419266 -0.60924234 5.423638e-01 0.9052339 0.6555804 1.244991
## 4 age -0.0016515166 0.006083056 -0.27149453 7.860107e-01 0.9983498 0.9864257 1.010256
## 6 smoking1 0.2203884367 0.171356638 1.28613889 1.983946e-01 1.2465608 0.8917694 1.747266
## 8 education1 0.0072440035 0.169823173 0.04265615 9.659756e-01 1.0072703 0.7191236 1.400591
## 10 BMI -0.0205541093 0.021530295 -0.95465990 3.397497e-01 0.9796557 0.9388467 1.021610
## 12 systolic -0.0001758354 0.004399858 -0.03996388 9.681219e-01 0.9998242 0.9911130 1.008379
## 14 diastolic -0.0010196342 0.007323325 -0.13923104 8.892676e-01 0.9989809 0.9845762 1.013284
## 16 CFHrs1061170 0.1648181445 0.108731134 1.51583211 1.295618e-01 1.1791787 0.9534430 1.460814
## 18 LOCrs10490924 0.6243454613 0.112922906 5.52895320 3.221473e-08 1.8670235 1.4977946 2.332986
## 20 CFHrs1410996 0.3154310240 0.128347280 2.45763699 1.398545e-02 1.3708501 1.0705744 1.771825
## 22 C2rs9332739 1.0717936770 0.433256076 2.47381107 1.336804e-02 2.9206134 1.3543217 7.626019
## 24 CFBrs641153 0.1993582016 0.253688461 0.78583866 4.319620e-01 1.2206191 0.7567549 2.055336
## 26 CFHrs2230199 0.3402726917 0.125293121 2.71581303 6.611324e-03 1.4053308 1.0974578 1.794700
# Linear model and cox regression model are also avaiable
reg_lm<-reg(data = diabetes, x = c(1:6,8:12), y = 7, factor = c(1, 3, 4), model = 'lm')
# Use varible names
reg_coxph<-reg(data = diabetes, y = "diabetes", time = "age", factor = c("sex", "smoking", "education"), model = 'coxph')
# Display could be used to a reg class to summarize univariate models
display(reg_glm)
##
## Call:
## reg(data = diabetes, y = 5, factor = c(1, 3, 4), model = "glm")
##
## Number of variables: 13
## Number of terms: 13
## Number of significant terms(alpha=0.05): 4
##
## Cumulative number of terms:
##
## number of terms
## p < 0.001 1
## p < 0.01 2
## p < 0.05 4
## p < 0.1 4
## p < 1 13
##
##
## p < 0.001: LOCrs10490924
##
##
## p < 0.01: LOCrs10490924, CFHrs2230199
##
##
## p < 0.05: LOCrs10490924, CFHrs1410996, C2rs9332739, CFHrs2230199
##
##
## p < 0.1: LOCrs10490924, CFHrs1410996, C2rs9332739, CFHrs2230199
##
##
## p < 1: sex1, age, smoking1, education1, BMI, systolic, diastolic, CFHrs1061170, LOCrs10490924, CFHrs1410996, C2rs9332739, CFBrs641153, CFHrs2230199
display(reg_lm)
##
## Call:
## reg(data = diabetes, x = c(1:6, 8:12), y = 7, factor = c(1, 3, 4), model = "lm")
##
## Number of variables: 11
## Number of terms: 11
## Number of significant terms(alpha=0.05): 4
##
## Cumulative number of terms:
##
## number of terms
## p < 0.001 3
## p < 0.01 3
## p < 0.05 4
## p < 0.1 4
## p < 1 11
##
##
## p < 0.001: age, BMI, diastolic
##
##
## p < 0.01: age, BMI, diastolic
##
##
## p < 0.05: sex1, age, BMI, diastolic
##
##
## p < 0.1: sex1, age, BMI, diastolic
##
##
## p < 1: sex1, age, smoking1, education1, diabetes, BMI, diastolic, CFHrs1061170, LOCrs10490924, CFHrs1410996, C2rs9332739
display(reg_coxph)
##
## Call:
## reg(data = diabetes, y = "diabetes", factor = c("sex", "smoking", "education"), model = "coxph", time = "age")
##
## Number of variables: 12
## Number of terms: 12
## Number of significant terms(alpha=0.05): 6
##
## Cumulative number of terms:
##
## number of terms
## p < 0.001 2
## p < 0.01 4
## p < 0.05 6
## p < 0.1 6
## p < 1 12
##
##
## p < 0.001: systolic, LOCrs10490924
##
##
## p < 0.01: BMI, systolic, LOCrs10490924, CFHrs2230199
##
##
## p < 0.05: BMI, systolic, LOCrs10490924, CFHrs1410996, C2rs9332739, CFHrs2230199
##
##
## p < 0.1: BMI, systolic, LOCrs10490924, CFHrs1410996, C2rs9332739, CFHrs2230199
##
##
## p < 1: sex1, smoking1, education1, BMI, systolic, diastolic, CFHrs1061170, LOCrs10490924, CFHrs1410996, C2rs9332739, CFBrs641153, CFHrs2230199
# `quickReg` package provides forest plot for univariate regression models
plot(reg_glm)
# One OR value is larger than others, we can set the limits
plot(reg_glm,limits=c(NA,3))
plot(reg_glm,limits=c(1,2))
# Sort the variables according to alphabetical
plot(reg_glm,limits=c(NA,3), sort ="alphabetical")
# Similarly, we can plot lm and cox regression results
plot(reg_lm,limits=c(-2,5))
plot(reg_coxph,limits=c(0.5,2))
# Modify plot.reg like ggplot2, add themes from package `ggthemes`
library(ggplot2);library(ggthemes)
## Warning: package 'ggthemes' was built under R version 3.3.1
plot(reg_coxph,limits=c(0.5,2))+
labs(list(title = "Logistic Regression Model", x = "variables"))+
theme_classic() %+replace%
theme(legend.position ="none",axis.text.x=element_text(angle=45,size=rel(1.5)))
The quickReg
package provides a flexible and convenient way to dispaly data and the association between variables. This vignette offers a glimpse of its use and features. The source code and help files are more helpful. The package is ongoing. Seamless subgroup analysis, more regression types and adjusted models may be avaliable in the future. Please contact me with any comments, questions and bug reports.