This vignette explains how to perform scale linking with the PROsetta package. By way of illustration, we replicate the linking of the Center for Epidemiologic Studies Depression Scale (CES-D) to the PROMIS Depression metric as described in Choi, Schalet, Cook, and Cella (2014).
First step is to load the input datasets comprised of three tables with loadData()
. The PROMIS Depression – CES-D linking data are included in the PROsetta package directory under the folder labeled data-raw
.
fp = system.file("data-raw", package = "PROsetta")
d = loadData(
response = "dat_DeCESD_v2.csv",
itemmap = "imap_DeCESD.csv",
anchor = "anchor_DeCESD.csv",
input_dir = fp)
response
: Contains item response data from both instruments. You can supply a .csv filename or a data frame. In this example, we supply a .csv filename dat_DeCESD_v2.csv
.itemmap
: Specifies which items belong to which instruments. Can be a .csv filename or a data frame.anchor
: Contains tem parameters for anchor items (e.g., PROMIS Depression). Can be a .csv filename or a data frame.input_dir
: (Optional) The path of the directory to look for the input .csv files.The response data contains individual item responses on both instruments (i.e., 28 PROMIS Depression items followed by 20 CES-D items). The data table should include the following columns.
prosettaid
: The person ID of the respondents (N = 747). This column does not have to be named prosettaid
but should not conflict with other data tables (item map and anchor).item_id
column in both the item map and anchor files.Run the following code, for example, to open the response data in edit mode.
file.edit(system.file("data-raw", "dat_DeCESD_v2.csv", package = "PROsetta"))
The item map data requires the following columns.
item_id
: Contains the unique ID of the items. The name of this column does not have to be item_id
but should be consistent with the item ID column in the anchor table. The IDs in this column should match the column names in the response data.instrument
: Numerals (1 or 2) indicating to which of the two instruments the items belong (e.g., 1 = PROMIS Depression; 2 = CES-D)item_order
: The sequential position of the items in the combined table (e.g., 1, 2, 3, …, 28, …, 48)item_name
: Secondary labels for the itemsncat
: The number of response categories by itemmin_score
: The minimum score by item (0 or 1)reverse
: Indicating whether each item has been reverse scored (0 = not reversed; 1 = reversed)scores
: A string containing comma-separated values for all possible scores of each item (e.g., “1,2,3,4,5”)Run the following code to open the item map data in edit mode.
file.edit(system.file("data-raw", "imap_DeCESD.csv" , package = "PROsetta"))
The anchor data contains the item parameters for the anchor scale (e.g., PROMIS Depression) and requires the following columns.
item_order
: The sequential position of the items in the anchor scale (e.g., 1, 2, 3, …, 28)item_id
: The unique ID of the anchor items. The name of this column does not have to be item_id
but should be consistent with the item ID column in the item map table The IDs in this column should refer to the specific column names in the response data.a
: The slope parameter value for each anchor itemcb1
, cb2
, …: The category boundary parameter values for each anchor itemncat
: The number of response categories for each anchor itemRun the following code to open the anchor data in edit mode.
file.edit(system.file("data-raw", "anchor_DeCESD.csv", package = "PROsetta"))
The frequency distribution of each item in the response data is obtained by runFrequency()
.
freq_table = runFrequency(d)
head(freq_table)
## 1 2 3 4 5
## EDDEP04 526 112 66 29 14
## EDDEP05 488 118 91 37 12
## EDDEP06 502 119 85 30 10
## EDDEP07 420 155 107 49 16
## EDDEP09 492 132 89 25 9
## EDDEP14 445 150 101 37 14
The frequency distribution of the summed scores for the combined scale can be plotted as a histogram with plot()
. The required argument is a PROsetta_data
object created with loadData()
. The optional scale
argument specifies for which scale the summed score should be created. Setting scale = 'combined'
plots the summed score distribution for the combined scale.
plot(d, scale = 'combined', title = "Combined scale")
The user can also generate the summed score distribution for the first or second scale by specifying scale = 1
or scale = 2
.
plot(d, scale = 1, title = "Scale 1") # not run
plot(d, scale = 2, title = "Scale 2") # not run
Basic descriptive statistics are obtained for each item by runDescriptive()
.
desc_table = runDescriptive(d)
head(desc_table)
## n mean sd median trimmed mad min max range skew kurtosis se
## EDDEP04 747 1.52 0.94 1 1.30 0 1 5 4 1.91 3.01 0.03
## EDDEP05 746 1.62 0.99 1 1.42 0 1 5 4 1.54 1.50 0.04
## EDDEP06 746 1.56 0.94 1 1.37 0 1 5 4 1.66 2.01 0.03
## EDDEP07 747 1.78 1.05 1 1.59 0 1 5 4 1.23 0.59 0.04
## EDDEP09 747 1.56 0.91 1 1.38 0 1 5 4 1.62 1.98 0.03
## EDDEP14 747 1.69 1.00 1 1.51 0 1 5 4 1.38 1.12 0.04
Classical reliability statistics can be obtained by runClassical()
. By default, the analysis is performed for the combined scale.
classical_table = runClassical(d)
summary(classical_table$alpha$combined)
##
## Reliability analysis
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.98 0.98 0.99 0.53 54 9e-04 1.7 0.69 0.54
The user can set scalewise = TRUE
to request an analysis for each scale separately in addition to the combined scale.
classical_table = runClassical(d, scalewise = TRUE)
classical_table$alpha$combined # alpha values for combined scale
classical_table$alpha$`1` # alpha values for each scale, created when scalewise = TRUE
classical_table$alpha$`2` # alpha values for each scale, created when scalewise = TRUE
Specifying omega = TRUE
returns the McDonald’s \(\omega\) coefficients as well.
classical_table = runClassical(d, scalewise = TRUE, omega = TRUE)
classical_table$omega$combined # omega values for combined scale
classical_table$omega$`1` # omega values for each scale, created when scalewise = TRUE
classical_table$omega$`2` # omega values for each scale, created when scalewise = TRUE
Additional arguments can be supplied to runClassical()
to pass onto psych::omega()
.
classical_table = runClassical(d, scalewise = TRUE, omega = TRUE, nfactors = 5) # not run
Dimensionality analysis is performed with CFA by runCFA()
. Setting scalewise = TRUE
performs the dimensionality analysis for each scale separately in addition to the combined scale.
out_cfa = runCFA(d, scalewise = TRUE)
runCFA()
calls for lavaan::cfa()
internally and can pass additional arguments onto it.
out_cfa = runCFA(d, scalewise = TRUE, std.lv = TRUE) # not run
The CFA result for the combined scale is stored in the combined
slot, and if scalewise = TRUE
, the analysis for each scale is also stored in each numbered slot.
out_cfa$combined
## lavaan 0.6-7 ended normally after 23 iterations
##
## Estimator DWLS
## Optimization method NLMINB
## Number of free parameters 220
##
## Used Total
## Number of observations 731 747
##
## Model Test User Model:
## Standard Robust
## Test Statistic 4227.611 4700.780
## Degrees of freedom 1080 1080
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.046
## Shift parameter 657.455
## simple second-order correction
out_cfa$`1`
## lavaan 0.6-7 ended normally after 16 iterations
##
## Estimator DWLS
## Optimization method NLMINB
## Number of free parameters 140
##
## Used Total
## Number of observations 738 747
##
## Model Test User Model:
## Standard Robust
## Test Statistic 863.527 1434.277
## Degrees of freedom 350 350
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 0.678
## Shift parameter 160.257
## simple second-order correction
out_cfa$`2`
## lavaan 0.6-7 ended normally after 20 iterations
##
## Estimator DWLS
## Optimization method NLMINB
## Number of free parameters 80
##
## Used Total
## Number of observations 740 747
##
## Model Test User Model:
## Standard Robust
## Test Statistic 1106.148 1431.797
## Degrees of freedom 170 170
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 0.812
## Shift parameter 69.205
## simple second-order correction
CFA fit indices can be obtained by using summary()
from the lavaan package. For the combined scale:
lavaan::summary(out_cfa$combined, fit.measures = TRUE, standardized = TRUE, estimates = FALSE)
## lavaan 0.6-7 ended normally after 23 iterations
##
## Estimator DWLS
## Optimization method NLMINB
## Number of free parameters 220
##
## Used Total
## Number of observations 731 747
##
## Model Test User Model:
## Standard Robust
## Test Statistic 4227.611 4700.780
## Degrees of freedom 1080 1080
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.046
## Shift parameter 657.455
## simple second-order correction
##
## Model Test Baseline Model:
##
## Test statistic 793198.133 91564.632
## Degrees of freedom 1128 1128
## P-value 0.000 0.000
## Scaling correction factor 8.758
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.996 0.960
## Tucker-Lewis Index (TLI) 0.996 0.958
##
## Robust Comparative Fit Index (CFI) NA
## Robust Tucker-Lewis Index (TLI) NA
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.063 0.068
## 90 Percent confidence interval - lower 0.061 0.066
## 90 Percent confidence interval - upper 0.065 0.070
## P-value RMSEA <= 0.05 0.000 0.000
##
## Robust RMSEA NA
## 90 Percent confidence interval - lower NA
## 90 Percent confidence interval - upper NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.051 0.051
and also for each scale separately:
lavaan::summary(out_cfa$`1` , fit.measures = TRUE, standardized = TRUE, estimates = FALSE) # not run
lavaan::summary(out_cfa$`2` , fit.measures = TRUE, standardized = TRUE, estimates = FALSE) # not run
runCalibration()
performs IRT calibration without anchoring. runCalibration()
calls mirt::mirt()
internally. Additional arguments can be passed onto mirt
, e.g., to increase the number of EM cycles to 1000, as follows:
out_calib = runCalibration(d, technical = list(NCYCLES = 1000))
In case of nonconvergence, runCalibration()
explicitly raises an error and does not return its results:
out_calib = runCalibration(d, technical = list(NCYCLES = 10))
## Error in runCalibration(d, technical = list(NCYCLES = 10)): calibration did not converge: increase iteration limit by adjusting the 'technical' argument, e.g., technical = list(NCYCLES = 510)
Also, specify fixedpar = TRUE
to perform fixed parameter calibration using the anchor data.
out_calib = runCalibration(d, fixedpar = TRUE)
The output object from runCalibration()
can be used to generate additional output with functions from the mirt package.
Use coef()
to extract item parameters:
mirt::coef(out_calib, IRTpars = TRUE, simplify = TRUE)
## $items
## a b1 b2 b3 b4
## EDDEP04 4.261 0.401 0.976 1.696 2.444
## EDDEP05 3.932 0.305 0.913 1.593 2.412
## EDDEP06 4.145 0.350 0.915 1.678 2.471
## EDDEP07 2.802 0.148 0.772 1.603 2.538
## EDDEP09 3.657 0.312 0.982 1.782 2.571
## EDDEP14 2.333 0.186 0.947 1.729 2.633
## EDDEP17 3.274 -0.498 0.406 1.413 2.375
## EDDEP19 3.241 0.460 1.034 1.834 2.515
## EDDEP21 2.736 0.072 0.810 1.803 2.673
## EDDEP22 3.970 0.204 0.795 1.649 2.295
## EDDEP23 2.564 -0.038 0.693 1.653 2.584
## EDDEP26 3.093 -0.358 0.412 1.404 2.224
## EDDEP27 2.920 0.204 0.891 1.655 2.528
## EDDEP28 2.588 -0.079 0.633 1.477 2.328
## EDDEP29 4.343 -0.117 0.598 1.428 2.272
## EDDEP30 2.613 -0.023 0.868 1.864 2.826
## EDDEP31 3.183 -0.261 0.397 1.305 2.134
## EDDEP35 3.106 0.044 0.722 1.639 2.471
## EDDEP36 3.483 -0.536 0.348 1.347 2.355
## EDDEP39 3.131 0.918 1.481 2.164 2.856
## EDDEP41 4.454 0.558 1.074 1.779 2.530
## EDDEP42 2.364 0.210 0.987 1.906 2.934
## EDDEP44 2.549 0.194 1.012 2.013 3.126
## EDDEP45 2.834 0.141 0.907 1.846 2.875
## EDDEP46 2.381 -0.458 0.478 1.546 2.632
## EDDEP48 3.185 0.198 0.782 1.526 2.324
## EDDEP50 2.018 -0.050 0.926 2.000 2.966
## EDDEP54 2.685 -0.299 0.423 1.358 2.308
## CESD1 2.074 0.876 1.921 3.064 NA
## CESD2 1.262 1.387 2.670 3.721 NA
## CESD3 3.512 0.833 1.316 1.949 NA
## CESD4 1.118 0.649 1.379 2.081 NA
## CESD5 1.605 0.429 1.526 2.724 NA
## CESD6 3.635 0.493 1.176 1.729 NA
## CESD7 1.828 0.287 1.368 2.134 NA
## CESD8 1.342 -0.067 0.823 1.620 NA
## CESD9 3.003 0.748 1.374 1.855 NA
## CESD10 2.060 1.172 2.043 3.268 NA
## CESD11 1.077 -0.463 0.947 2.160 NA
## CESD12 2.229 0.169 0.945 1.737 NA
## CESD13 1.288 0.342 1.696 2.915 NA
## CESD14 2.176 0.491 1.291 1.864 NA
## CESD15 1.397 0.965 2.321 3.604 NA
## CESD16 2.133 0.272 0.922 1.808 NA
## CESD17 1.719 1.607 2.317 3.470 NA
## CESD18 2.812 0.261 1.248 1.984 NA
## CESD19 1.834 0.784 1.875 2.639 NA
## CESD20 1.491 -0.140 1.256 2.297 NA
##
## $means
## F1
## -0.06
##
## $cov
## F1
## F1 0.95
and also other commonly used functions:
mirt::itemfit(out_calib, empirical.plot = 1)
mirt::itemplot(out_calib, item = 1, type = "info")
mirt::itemfit(out_calib, "S_X2", na.rm = TRUE)
Scale information functions can be plotted with plotInfo
. The two required arguments are an output object from runCalibration()
and a PROsetta
object from loadData()
. The additional arguments specify the labels, colors, and line types for each scale and the combined scale. The last values in arguments scale_label
, color
, lty
represent the values for the combined scale.
plotInfo(
out_calib, d,
scale_label = c("PROMIS Depression", "CES-D", "Combined"),
color = c("blue", "red", "black"),
lty = c(1, 2, 3))