# Dexter: The Fundamentals

#### 2017-08-23

Dexter is an R (R Development Core Team 2005) package intended as a robust and fairly comprehensive system for managing and analyzing test data organized in booklets. It includes facilities for importing and managing test data, assessing and improving the quality of data through basic test-and-item analysis, fitting an IRT model, and computing various estimates of ability.

Dexter differs from other psychometric software both in terms of the tools in includes as in those it omits. The data management component is quite developed to promote data integrity and prevent SQL injections and other mishaps. Many psychometric methods not found elsewhere are provided, such as Haberman’s (2007) interaction model generalized for polytomous items, new methods for exploratory and confirmatory DIF analysis, support for the 3DC method of standard setting, and more. On the other hand, there is no support for multivariate IRT models, 2PL and 3PL models and other methods sufficiently represented in other packages. The central IRT model is a polytomous generalization of the extended marginal Rasch model: we call it the Extended NOminal Response Model (ENORM).

Dexter scales well to a wide range of situations, from tiny examples to production-size problems. Parallel to problem size, the level of complexity and flexibility offered to the user is also scalable. For example, users fluent in DBMS may wish to tweak the underlying database using either R or SQL, but it is perfectly possible to use Dexter without even realizing that a database is at work in the background. Visual utilities based on shiny are provided for some of the most common analyses, but all methods and graphs can be accessed through programming language for customized analyses and reports.

Simplicity of use is achieved by imposing some simple and clear-cut constrains:

• The package does not do much management of persons (test-takers), leaving this task mainly to the user. Data without any person identification will be happily processed, but an unlimited number of person covariates will be accepted, stored and possibly attached to estimates of ability. Users who wish to do their own person management may include their own person IDs, and person covariates may be input and used in DIF analysis. The package analyses one test, possibly given in many different test forms (booklets). It is assumed that each person takes the test only once.
• In principle, all items are assumed to be closed, i.e., to have a limited number of possible responses, each of which is assigned a test score. This does not preclude the use of open format items, but these must be graded externally.

To enter and analyse data, the user must start a new project with function start_new_project. This takes as a vital argument a data frame of scoring rules. The data frame must contain, in any order, three columns named exactly item, response, and score. It must provide an exhaustive list of all items in the test (across all booklets), all admissible responses to each items, and the score that will be given to each response.

Open items are accommodated by entering given scores in place of the original responses, and providing a trivial scoring rule (0 is scored as 0, 1 as 1, etc.) Missing value codes may be listed explicitly, but any actual responses not matched in the scoring rules will be automatically assigned a score of 0.

Another argument that may be passed to start_new_project is a name for the SQLite database file (there is a default value). The function creates an empty database under that name, and users are then able to input response data. Finally, the names of any person covariates to be entered along with the cognitive data must be declared here.

While various approaches to inputting data may be possible, we believe that the one provided with Dexter is easy and safe. As most researchers seem to keep their data in the rectangular data formats provided by software like Excel, SPSS, or even R, we propose the repeated use of function add_booklet. The user will first use functions like read.table, read.csv or packages like readxl, haven, readr, foreign, to read the data for each booklet separately. Function add_booklet will then add the data to the database and return a detailed report on its actions. It is very important to read this report carefully.

Basically, the data from all columns whose names have an exact match in the scoring rules table will be treated as responses to items. Hence, great care must be taken with column names but, apart from that, the process is fairly automatic.

In addition to the person properties provided with the booklets, users can also provide an arbitrary number of item properties. These are supplied as a data frame, one column per item property, and including an item ID column called exactly item. The data frame is then passed to function add_item_properties. The function returns a detailed report about any items not matched in the database, or items present in the database but not given item properties.

Functions like show_booklets, show_items, show_item_properties, and show_person_properties help users keep track of what has been entered in the database.

All analyses and graphs are available through R syntax, which facilitates automation and easy programming of customized reports. In addition, there are two interactive functions, iTIA and iModels

Tools for assessing the quality of items and tests include: * a very traditional table of test-and-item analysis (mean scores, p-values, correlations between the item and the sum/rest score on the test etc.), produced with function tia_tables, and supplemented with distractor plots * a much less traditional comparison between the Rasch / Partial credit model and the Interaction model proposed by Haberman (2007). This is produced with function fit_inter and equipped with print and plot methods.

The interaction model was initially developed for dichotomous items. We have generalised it for items having more than two (potentially, a largish number) of response categories. A useful upshot is that we can represent scores on subtests as super-items and analyse these as normal items (function fit_domains).

Function profile_plot produces a novel plot that provides information related to both DIF analysis and the profile analysis proposed by Verhelst (2012).

When the test contains multiple test forms (booklets), the design is determined automatically as the data is being entered and compared against the scoring rules. Function design_is_connected is useful in determining whether the design is connected.

The main function to calibrate an IRT model is fit_enorm. Examinee proficiency can be estimated by maximum likelihood (function ability) or via plausible values.

Most of the functions accept an arbitrarily complex logical predicate involving booklets, items, responses, persons, person and item covariates etc., allowing to subset the data. An obvious use is to exclude, in a flexible way, some data from analysis. As subsetting can be applied to the functions independently, we can, e.g., estimate a model on one subset of examinees and apply it to another.

As an example of first steps with Dexter, consider the verbal aggression data (Vansteelandt 2000) analysed in great detail in (Paul De Boeck 2004) and many others. 243 females and 73 males have assessed on a 3-point scale (‘yes’, ‘perhaps’, or ‘no’) how likely they are to become verbally aggressive in four different frustrating situations. Responses are further structured by type (curse, scold, or shout) and mode (want to do or actually do), resulting in 24 items.

A new project always starts with a complete enumeration of all items, all admissible responses, and the scores assigned to each distinct response. The data does not come from a cognitive test, so the scoring rules are trivial. We have supplied them in data set verbAggrRules:

library(dexter)
head(verbAggrRules)
##          item response score
## 1 S1WantCurse        0     0
## 2 S1WantCurse        1     1
## 3 S1WantCurse        2     2
## 4   S1DoCurse        0     0
## 5   S1DoCurse        1     1
## 6   S1DoCurse        2     2
db = start_new_project(verbAggrRules, "verbAggression.db", covariates=list(gender="<unknown>"))

There is a single booklet provided in data set verbAggrData; we add it to the project:

head(verbAggrData)
##   Gender S1DoCurse S1DoScold S1DoShout S1WantCurse S1WantScold S1WantShout
## 1   Male         1         0         1           0           0           0
## 2 Female         1         2         1           2           2           2
## 3 Female         0         0         0           1           0           0
## 4 Female         2         0         0           2           2           0
## 5 Female         0         0         0           0           1           0
## 6 Female         2         2         0           0           2           0
##   S2DoCurse S2DoScold S2DoShout S2WantCurse S2WantScold S2WantShout
## 1         1         0         0           0           0           0
## 2         2         2         1           2           2           1
## 3         0         0         0           1           0           0
## 4         1         1         0           1           1           0
## 5         0         0         0           0           1           0
## 6         2         2         0           0           2           0
##   S3DoCurse S3DoScold S3DoShout S3WantCurse S3WantScold S3WantShout
## 1         1         0         0           0           0           1
## 2         0         0         0           1           0           1
## 3         0         0         0           0           0           0
## 4         1         0         0           2           0           0
## 5         0         0         0           0           1           0
## 6         1         1         0           2           1           0
##   S4DoCurse S4DoScold S4DoShout S4WantCurse S4WantScold S4WantShout
## 1         2         2         2           2           0           0
## 2         1         1         1           1           1           1
## 3         1         0         0           1           0           0
## 4         1         1         0           2           2           0
## 5         2         0         0           2           0           0
## 6         2         2         0           2           2           0
add_booklet(db, verbAggrData, "agg")
## $items ## [1] "S1DoCurse" "S1DoScold" "S1DoShout" "S1WantCurse" "S1WantScold" ## [6] "S1WantShout" "S2DoCurse" "S2DoScold" "S2DoShout" "S2WantCurse" ## [11] "S2WantScold" "S2WantShout" "S3DoCurse" "S3DoScold" "S3DoShout" ## [16] "S3WantCurse" "S3WantScold" "S3WantShout" "S4DoCurse" "S4DoScold" ## [21] "S4DoShout" "S4WantCurse" "S4WantScold" "S4WantShout" ## ##$covariates
## [1] "gender"
##
## $columns_ignored ## character(0) ## ##$auto_add_unknown_rules
## [1] TRUE
##
## $zero_rules_added ## [1] item_id response ## <0 rows> (or 0-length row.names) The report looks clean. We can take a look at the booklets and items: show_booklets(db) ## booklet_id n_items n_persons ## 1 agg 24 316 show_items(db) ## item_id ## 1 S1DoCurse ## 2 S1DoScold ## 3 S1DoShout ## 4 S1WantCurse ## 5 S1WantScold ## 6 S1WantShout ## 7 S2DoCurse ## 8 S2DoScold ## 9 S2DoShout ## 10 S2WantCurse ## 11 S2WantScold ## 12 S2WantShout ## 13 S3DoCurse ## 14 S3DoScold ## 15 S3DoShout ## 16 S3WantCurse ## 17 S3WantScold ## 18 S3WantShout ## 19 S4DoCurse ## 20 S4DoScold ## 21 S4DoShout ## 22 S4WantCurse ## 23 S4WantScold ## 24 S4WantShout We can also add the item properties, i.e., the three factors defining the experimental design. Again, we have supplied them in a separate data set: data("verbAggrProperties") head(verbAggrProperties) ## item behavior mode blame situation ## 1 S1DoCurse Curse Do Other Bus ## 2 S1DoScold Scold Do Other Bus ## 3 S1DoShout Shout Do Other Bus ## 4 S1WantCurse Curse Want Other Bus ## 5 S1WantScold Scold Want Other Bus ## 6 S1WantShout Shout Want Other Bus add_item_properties(db, verbAggrProperties) show_item_properties(db) ## # A tibble: 96 x 3 ## # Groups: value [11] ## item_property value N ## <chr> <chr> <int> ## 1 behavior Curse 8 ## 2 behavior Scold 8 ## 3 behavior Shout 8 ## 4 behavior Curse 8 ## 5 behavior Scold 8 ## 6 behavior Shout 8 ## 7 behavior Curse 8 ## 8 behavior Scold 8 ## 9 behavior Shout 8 ## 10 behavior Curse 8 ## # ... with 86 more rows show_person_properties(db) ## # A tibble: 316 x 3 ## # Groups: value [2] ## person_property value N ## <chr> <chr> <int> ## 1 gender Male 73 ## 2 gender Female 243 ## 3 gender Male 73 ## 4 gender Female 243 ## 5 gender Male 73 ## 6 gender Male 73 ## 7 gender Female 243 ## 8 gender Female 243 ## 9 gender Female 243 ## 10 gender Female 243 ## # ... with 306 more rows Let us do the simplest analysis based on classical test theory: tt = tia_tables(db) knitr::kable(tt$itemStats, digits=3)
booklet_id item_id meanScore sdScore maxScore pvalue rit rir n
agg S1DoCurse 1.082 0.808 2 0.541 0.582 0.519 316
agg S1DoScold 0.832 0.817 2 0.416 0.651 0.596 316
agg S1DoShout 0.468 0.710 2 0.234 0.520 0.460 316
agg S1WantCurse 1.123 0.828 2 0.562 0.537 0.468 316
agg S1WantScold 0.930 0.852 2 0.465 0.593 0.528 316
agg S1WantShout 0.712 0.778 2 0.356 0.529 0.464 316
agg S2DoCurse 1.003 0.834 2 0.502 0.590 0.527 316
agg S2DoScold 0.684 0.781 2 0.342 0.633 0.578 316
agg S2DoShout 0.326 0.616 2 0.163 0.532 0.481 316
agg S2WantCurse 1.222 0.774 2 0.611 0.529 0.465 316
agg S2WantScold 0.959 0.840 2 0.479 0.562 0.494 316
agg S2WantShout 0.734 0.816 2 0.367 0.563 0.498 316
agg S3DoCurse 0.576 0.693 2 0.288 0.497 0.438 316
agg S3DoScold 0.294 0.557 2 0.147 0.505 0.458 316
agg S3DoShout 0.104 0.345 2 0.052 0.344 0.310 316
agg S3WantCurse 0.810 0.766 2 0.405 0.474 0.406 316
agg S3WantScold 0.462 0.654 2 0.231 0.521 0.467 316
agg S3WantShout 0.282 0.534 2 0.141 0.438 0.389 316
agg S4DoCurse 0.883 0.786 2 0.441 0.528 0.463 316
agg S4DoScold 0.566 0.725 2 0.283 0.567 0.510 316
agg S4DoShout 0.225 0.513 2 0.112 0.424 0.377 316
agg S4WantCurse 0.978 0.774 2 0.489 0.495 0.427 316
agg S4WantScold 0.589 0.744 2 0.294 0.573 0.515 316
agg S4WantShout 0.424 0.684 2 0.212 0.453 0.391 316
knitr::kable(tt$testStats, digits=3) booklet_id nItems alpha meanP meanRit meanRir maxTestScore N agg 24 0.888 0.339 0.527 0.468 48 316 The Rasch model and the Interaction models can be estimated with the fit_inter’ function. This always happens on a booklet-per-booklet basis, so the number of the booklet must always be given. If more than one booklet are specified, the function will try to find their intersection, i.e., the maximum rectangular data set containing items that are common to all booklets specified. If the intersection is empty, there will be just a warning message. Unfortunately, we cannot demonstrate this with the verbal aggression data because it only contains one booklet. m = fit_inter(db, booklet_id=='agg') print(m) ##$item_id
##  [1] "S1DoCurse"   "S1DoScold"   "S1DoShout"   "S1WantCurse" "S1WantScold"
##  [6] "S1WantShout" "S2DoCurse"   "S2DoScold"   "S2DoShout"   "S2WantCurse"
## [11] "S2WantScold" "S2WantShout" "S3DoCurse"   "S3DoScold"   "S3DoShout"
## [16] "S3WantCurse" "S3WantScold" "S3WantShout" "S4DoCurse"   "S4DoScold"
## [21] "S4DoShout"   "S4WantCurse" "S4WantScold" "S4WantShout"
##
## attr(,"class")
## [1] "tbl_df"     "tbl"        "data.frame"

The print function returns only the item IDs because the parameters of the interaction model are not very easy to interpret. The models are most useful when plotted:

plot(m, "S1DoScold", show.observed=TRUE)

By default, the plot shows the regressions of the item score on the total score. The interaction model is shown with a thicker but lighter line, and the Rasch model is shown with a thinner, darker line. The unsmoothed data is not shown by default, so we had to change the show.observed parameter. The ‘curtains’ are drawn at the 5% lowest and the 5% highest sum scores.

The Rasch, or, in this case, Partial Credit Model, tends to fit these items very well, so the two curves practically coincide. This need not be always the case. The interaction model is quite interesting for practice. It shares the conditioning properties of the Rasch model, which means that we can predict the expected item score at each observed total score. Hence, we have the possibility to represent the model and the observed data on the same plot, without tricks or circular reasoning.

Furthermore, the interaction model reproduces the mean item scores and the correlations between the item scores and the total score: the vital classical test statistics in the table above. In other words, the interaction model represents, as far as we are concerned, the data, and the two item-total regressions capture well the systematic departures of the Rasch/PCM model from the data. What remains is predominantly random noise; if we had an item where the Rasch model does not fit (not available in this data set), we would observe that the pink dots tend to fluctuate randomly around the interaction model.

We can show the response category probabilities instead of the expected item score:

plot(m, 'S1DoCurse', summate=FALSE)

Now fit and display models for the sum scores on each of the four frustrating situations:

mSit = fit_domains(db, item_property= "situation")
plot(mSit)

Finally, a profile plot comparing responses by males and females on the two modes, want to curse (scold, shout), against actually curse (scold, shout):

profile_plot(db, item_property='mode', covariate='gender')

We would not like to see such a differential effect in a cognitive test, but with this data set the result is quite interesting.

Function fit_enorm estimates the parameters of the Extended NOminal Response Model, the principal IRT model. In fact, we have seen this model above, where it was compared to the interaction model. However, similar to the methods associated with CTT, the interaction model requires complete data and can be only applied on a booklet-per-booklet basis, while the fit_enorm function also works with incomplete designs (function design_is_connected is useful in determining whether the design is connected):

parms = fit_enorm(db)
## ==

The default estimation method is conditional maximum likelihood (CML), which is usually quite fast. As an alternative, we have provided a reasonably fast Gibbs sampler:

parms_gibbs = fit_enorm(db, method='Bayes')
## ===========================================================================

The function returns a parameters object, which can be passed to the functions that estimate person parameters. Currently, there are two of them: ability, which produces maximum likelihood (MLE) and Bayes expected a posteriori (EAP) estimators, and plausible_values. Being able to choose between a frequentist or a Bayesian approach to the estimation of either the item parameters or the person parameters enables the user to consider various sources of uncertainty.

Let us draw plausible values for our subject in what is now a scale of verbal aggression, and examine their distribution.

pv = plausible_values(db, parms)
plot(density(pv$PV1)) For a direct comparison on gender: pv = merge(pv, dbReadTable(db, 'dxPersons')) boxplot(PV1~gender, data=pv) It appears that we cannot find a marked difference in verbal agression between male and female respondents overall. Compare this result to the very interesting structural differences revealed by the profile plot above. To show dexter at work with a large set of cognitive test data, we would like to analyse a PISA data set. Such a problem is probably too large for a vignette, so we will only present a possible script without actually running it.  library(dplyr) library(readr) library(SAScii) # Fetching the data from the OECD site requires a certain ... dexterity # for which we are indebted to a kind soul on stackexchange. # Download the data dictionary and read it in with SAScii temp = tempfile() download.file("https://www.oecd.org/pisa/pisaproducts/PISA2012_SAS_scored_cognitive_item.sas", temp) dict_scored = parse.SAScii(sas_ri = temp) unlink(temp) # Download the scored cognitive data temp = tempfile() download.file("https://www.oecd.org/pisa/pisaproducts/INT_COG12_S_DEC03.zip",temp, mode="wb") unzip(temp, "INT_COG12_S_DEC03.txt") scored = read_fwf(file = 'INT_COG12_S_DEC03.txt', col_positions = fwf_widths(dict_scored$width), progress = TRUE)
colnames(scored) = dict_scored$varname unlink(temp) # Keep only the maths items standard = 1:13 pisa12_M = subset(scored, (BOOKID %in% standard), select=c("CNT", "BOOKID", grep("PM",names(scored), value=TRUE))) rm(scored) # Items missing by design are coded with 7, and actual nonresponse with 8 # Will replace both with NA for simplicity. tempBOOKID = pisa12_M$BOOKID
pisa12_M[pisa12_M==7] = NA
pisa12_M[pisa12_M==8] = NA
pisa12_M$BOOKID = tempBOOKID rm(tempBOOKID) allNA = apply(pisa12_M, 2, function(x)all(is.na(x))) pisa12_M = pisa12_M[, !allNA] mathItems = grep("PM",names(pisa12_M), value=TRUE) # prepare the scoring rules responses = lapply(mathItems, function(x)unique(pisa12_M[,x])) foo = lapply(responses, function(x){ y=data.frame(item=names(x),response=x[,1],score=0L) names(y)[2]="response" y }) rules = Reduce(rbind, foo) rules$score[rules$response==1] = 1 rules$score[rules$response==2] = 2 rules=rules[order(rules$item, rules$response),] # create the new project db = start_new_project(rules, "pisa_math.db", covariates=list(cnt='<unknown country>')) # add all booklets one by one, deleting columns that may be all NA pisa12_M %>% group_by(BOOKID) %>% do({ allMissing = apply(., 2, function(x)all(is.na(x))) dat = .[,!allMissing] print(.$BOOKID[1])
add_booklet(db, dat, .$BOOKID[1]) }) rm(pisa12_M) # add some item properties -- we have supplied a data set to # make things a bit easier items = data.frame(item=show_items(db)$item)
domain = subset(
merge(items, PISA_item_class, by.x="item", by.y="ItemCode"),
select = c("item", "Content")
)
domain$isSaS[domain$Content=="Space and shape"] = "SaS"
domain$isSaS[domain$Content!="Space and shape"] = "notSaS"
show_item_properties(db)
show_person_properties(db)

# Fit the two models for booklet 1, all countries (takes about
# 10 seconds on my current machine)
system.time({m <- fit_inter2(db,booklet_id=='1')})
plot(m)

# Analyse by domain
md = fit_domains(db, 1, 'Content')
plot(md, nr=2, nc=2)

# Compare three countries on 'Space and shape' vs NOT 'Shape and space'
dev.off()
profile_plot(db, 1, "isSaS", "CNT", c("DEU","FRA","ITA"))
dbDisconnect(db)`

# References

Paul De Boeck, Mark Wilson, ed. 2004. Explanatory Item Response Models. Springer.

R Development Core Team. 2005. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org.

Vansteelandt, K. 2000. “Formal Methods for Contextualized Personality Psychology.” PhD thesis, K. U. Leuven.