# Dexter: The Fundamentals

#### 2017-02-07

Dexter is the first of three packages that jointly realize our current approach to the analysis of test data. Dexter provides basic data handling and test-and-item analysis, while the other two packages estimate item parameters and various measures of ability.

While estimation techniques are available in various psychometric libraries, we are not aware of a package that offers a systematic DBMS solution for test data. We have opted for a minimalist approach based on RSQLite that should scale for a wide range of situations, from tiny examples to production-size problems.

Similar to problem size, the level of complexity presented to the user is also scalable. Those fluent in DBMS may wish to tweak the underlying database using either R or SQL, but it is perfectly possible to use Dexter without even realizing that a database is at work in the background.

Simplicity of use is achieved by imposing some simple and clear-cut constrains:

• The package does not do any management of persons (test-takers), leaving this entirely to the user. Data without any person identification will be happily processed, but an unlimited number of person covariates will be accepted, stored and possibly attached to estimates of ability. Users who wish to do their own person management may include their own person IDs, and person covariates may be input and used in DIF analysis. The package analyses one test, possibly given in many different test forms (booklets). It is assumed that each person takes the test only once.
• In principle, all items are assumed to be closed, i.e., have a limited number of possible responses, each of which is assigned a test score. This does not preclude the use of open format items, but these must be graded externally.

To enter and analyse data, the user must start a new project with function start_new_project(). This takes as a vital argument a data frame of scoring rules. The data frame must contain, in any order, three columns named exactly item, response, and score. It must provide an exhaustive list of all items in the test (across all booklets), all admissible responses to each items, and the score that will be given to each response.

Open items are accommodated by entering given scores in place of the original responses, and providing a trivial scoring rule (0 is scored as 0, 1 as 1, etc.) Missing value codes may be listed explicitly, but any actual responses not matched in the scoring rules will be automatically assigned a score of 0.

Another argument that may be passed to start_new_project() is a name for the SQLite database file (there is a default value). The function creates an empty database under that name, and users are then able to input response data.

While various approaches to inputting data may be possible, we believe that the one provided with Dexter is easy and safe. As most researchers seem to keep their data in the rectangular data formats provided by software like Excel, SPSS, or even R, we propose the repeated use of function add_booklet(). The user will first use functions like read.table(), read.csv() or packages like readxl, haven, readr, foreign, to read the data for each booklet separately. Function add_booklet() will then add the data to the database and return a detailed report on its actions. It is very important to read this report carefully.

Basically, the data from all columns whose names have an exact match in the scoring rules table will be treated as responses to items. All other columns will be treated as person covariates and simply added to the database for further reference. Hence, great care must be taken with column names but, apart from that, the process is fairly automatic.

Non-cognitive variable supplied with the data are treated by name, so one should be careful with their names as well. If a variable is called sex in one booklet, gender in another, and Gender in a third, it will be treated as three different variables. However, a variable with the same name but positioned in different columns will be treated as the same.

In addition to the person properties provided with the booklets, users can also provide an arbitrary number of item properties. These are supplied as a data frame, one column per item property, and including an item ID column called exactly item. The data frame is then passed to function add_item_properties(). The function returns a detailed report about any items not matched in the database, or items present in the database but not given item properties.

Functions like show_booklets(), show_items(), show_item_properties(), and show_person_properties() help users keep track of what has been entered in the database.

All analyses and graphs are available through R syntax, which facilitates automation and easy programming of customized reports. In addition, there are two interactive functions, iTIA() and iModels(), which are currently found in a separate package, shinyDexter. These may be sufficient for the less demanding users.

There are two main analytic tools: * a very traditional table of test-and-item analysis (mean scores, p-values, correlations between the item and the sum/rest score on the test etc.), produced with function tia_tables(), and supplemented with distractor plots * a much less traditional comparison between the Rasch / Partial credit model and the Interaction model proposed by Haberman (2007). This is produced with function fit_models() and equipped with print and plot methods.

The interaction model was initially developed for dichotomous items. We have generalised it for items having more than two (potentially, a largish number) of response categories. A useful upshot is that we can represent scores on subtests as super-items and analyse these as normal items (function fit_domains()).

Finally, function profile_plot() produces a novel plot that provides information related to both DIF analysis and the profile analysis proposed by Verhelst (2012).

As an example, consider the verbal aggression data (Vansteelandt 2000) analysed in great detail in (Paul De Boeck 2004) and many others. 243 females and 73 males have assessed on a 3-point scale (‘yes’, ‘perhaps’, or ‘no’) how likely they are to become verbally aggressive in four different frustrating situations. Responses are further structured by type (curse, scold, or shout) and mode (want to do or actually do), resulting in 24 items.

A new project always starts with a complete enumeration of all items, all admissible responses, and the scores assigned to each distinct response. The data does not come from a cognitive test, so the scoring rules are trivial. We have supplied them in data set verbAggrRules:

library(dexter)
## Loading required package: RSQLite
## Loading required package: plyr
## Loading required package: reshape2
## Loading required package: colorspace
head(verbAggrRules)
##          item response score
## 1 S1WantCurse        0     0
## 2 S1WantCurse        1     1
## 3 S1WantCurse        2     2
## 4   S1DoCurse        0     0
## 5   S1DoCurse        1     1
## 6   S1DoCurse        2     2
db = start_new_project(verbAggrRules, "verbAggression.db")

There is a single booklet provided in data set verbAggrData; we add it to the project:

head(verbAggrData)
##   Gender S1DoCurse S1DoScold S1DoShout S1WantCurse S1WantScold S1WantShout
## 1   Male         1         0         1           0           0           0
## 2 Female         1         2         1           2           2           2
## 3 Female         0         0         0           1           0           0
## 4 Female         2         0         0           2           2           0
## 5 Female         0         0         0           0           1           0
## 6 Female         2         2         0           0           2           0
##   S2DoCurse S2DoScold S2DoShout S2WantCurse S2WantScold S2WantShout
## 1         1         0         0           0           0           0
## 2         2         2         1           2           2           1
## 3         0         0         0           1           0           0
## 4         1         1         0           1           1           0
## 5         0         0         0           0           1           0
## 6         2         2         0           0           2           0
##   S3DoCurse S3DoScold S3DoShout S3WantCurse S3WantScold S3WantShout
## 1         1         0         0           0           0           1
## 2         0         0         0           1           0           1
## 3         0         0         0           0           0           0
## 4         1         0         0           2           0           0
## 5         0         0         0           0           1           0
## 6         1         1         0           2           1           0
##   S4DoCurse S4DoScold S4DoShout S4WantCurse S4WantScold S4WantShout
## 1         2         2         2           2           0           0
## 2         1         1         1           1           1           1
## 3         1         0         0           1           0           0
## 4         1         1         0           2           2           0
## 5         2         0         0           2           0           0
## 6         2         2         0           2           2           0
add_booklet(db, verbAggrData, "data")
## $items ## [1] "S1DoCurse" "S1DoScold" "S1DoShout" "S1WantCurse" "S1WantScold" ## [6] "S1WantShout" "S2DoCurse" "S2DoScold" "S2DoShout" "S2WantCurse" ## [11] "S2WantScold" "S2WantShout" "S3DoCurse" "S3DoScold" "S3DoShout" ## [16] "S3WantCurse" "S3WantScold" "S3WantShout" "S4DoCurse" "S4DoScold" ## [21] "S4DoShout" "S4WantCurse" "S4WantScold" "S4WantShout" ## ##$covariates
## [1] "Gender"
##
## $not_on_list ## [1] booklet item response score ## <0 rows> (or 0-length row.names) ## ##$n_responses_treated_as_NA
## [1] 0

The report looks clean. We can take a look at the booklets and items:

show_booklets(db)
##   bookletName nItems nPersons
## 1        data     24      316
show_items(db)
##           item data
## 1    S1DoCurse    1
## 2    S1DoScold    2
## 3    S1DoShout    3
## 4  S1WantCurse    4
## 5  S1WantScold    5
## 6  S1WantShout    6
## 7    S2DoCurse    7
## 8    S2DoScold    8
## 9    S2DoShout    9
## 10 S2WantCurse   10
## 11 S2WantScold   11
## 12 S2WantShout   12
## 13   S3DoCurse   13
## 14   S3DoScold   14
## 15   S3DoShout   15
## 16 S3WantCurse   16
## 17 S3WantScold   17
## 18 S3WantShout   18
## 19   S4DoCurse   19
## 20   S4DoScold   20
## 21   S4DoShout   21
## 22 S4WantCurse   22
## 23 S4WantScold   23
## 24 S4WantShout   24

We can also add the item properties, i.e., the three factors defining the experimental design. Again, we have supplied them in a separate data set:

data("verbAggrProperties")
head(verbAggrProperties)
##          item behavior mode blame situation
## 1   S1DoCurse    Curse   Do Other       Bus
## 2   S1DoScold    Scold   Do Other       Bus
## 3   S1DoShout    Shout   Do Other       Bus
## 4 S1WantCurse    Curse Want Other       Bus
## 5 S1WantScold    Scold Want Other       Bus
## 6 S1WantShout    Shout Want Other       Bus
add_item_properties(db, verbAggrProperties)
## $unknown_items ## NULL ## ##$items_unaccounted_for
## character(0)
show_item_properties(db)
## 'data.frame':    24 obs. of  5 variables:
##  $item : chr "S1DoCurse" "S1DoScold" "S1DoShout" "S1WantCurse" ... ##$ behavior : chr  "Curse" "Scold" "Shout" "Curse" ...
##  $mode : chr "Do" "Do" "Do" "Want" ... ##$ blame    : chr  "Other" "Other" "Other" "Other" ...
##  $situation: chr "Bus" "Bus" "Bus" "Bus" ... show_person_properties(db) ## variable ## 1 Gender Let us do the simplest analysis based on classical test theory: tt = tia_tables(db) knitr::kable(tt$itemStats, digits=3)
item nBooklets Pval Rit Rir N mnScore sdScore
S1DoCurse 1 0.541 0.582 0.519 316 1.082 0.808
S1DoScold 1 0.416 0.651 0.596 316 0.832 0.817
S1DoShout 1 0.234 0.520 0.460 316 0.468 0.710
S1WantCurse 1 0.562 0.537 0.468 316 1.123 0.828
S1WantScold 1 0.465 0.593 0.528 316 0.930 0.852
S1WantShout 1 0.356 0.529 0.464 316 0.712 0.778
S2DoCurse 1 0.502 0.590 0.527 316 1.003 0.834
S2DoScold 1 0.342 0.633 0.578 316 0.684 0.781
S2DoShout 1 0.163 0.532 0.481 316 0.326 0.616
S2WantCurse 1 0.611 0.529 0.465 316 1.222 0.774
S2WantScold 1 0.479 0.562 0.494 316 0.959 0.840
S2WantShout 1 0.367 0.563 0.498 316 0.734 0.816
S3DoCurse 1 0.288 0.497 0.438 316 0.576 0.693
S3DoScold 1 0.147 0.505 0.458 316 0.294 0.557
S3DoShout 1 0.052 0.344 0.310 316 0.104 0.345
S3WantCurse 1 0.405 0.474 0.406 316 0.810 0.766
S3WantScold 1 0.231 0.521 0.467 316 0.462 0.654
S3WantShout 1 0.141 0.438 0.389 316 0.282 0.534
S4DoCurse 1 0.441 0.528 0.463 316 0.883 0.786
S4DoScold 1 0.283 0.567 0.510 316 0.566 0.725
S4DoShout 1 0.112 0.424 0.377 316 0.225 0.513
S4WantCurse 1 0.489 0.495 0.427 316 0.978 0.774
S4WantScold 1 0.294 0.573 0.515 316 0.589 0.744
S4WantShout 1 0.212 0.453 0.391 316 0.424 0.684
knitr::kable(tt$testStats, digits=3) booklet nItems alpha avgPval avgRit avgRir maxScore N 1 24 0.888 0.339 0.527 0.468 48 316 The Rasch model and the Interaction models can be estimated with the fit_models()’ function. This always happens on a booklet-per-booklet basis, so the number of the booklet must always be given. If more than one booklet are specified, the function will try to find their intersection, i.e., the maximum rectangular data set containing items that are common to all booklets specified. If the intersection is empty, there will be just a warning message. Unfortunately, we cannot demonstrate this with the verbal aggression data because it only contains one booklet. m = fit_models(db, 1) print(m) ## [1] "S1DoCurse" "S1DoScold" "S1DoShout" "S1WantCurse" "S1WantScold" ## [6] "S1WantShout" "S2DoCurse" "S2DoScold" "S2DoShout" "S2WantCurse" ## [11] "S2WantScold" "S2WantShout" "S3DoCurse" "S3DoScold" "S3DoShout" ## [16] "S3WantCurse" "S3WantScold" "S3WantShout" "S4DoCurse" "S4DoScold" ## [21] "S4DoShout" "S4WantCurse" "S4WantScold" "S4WantShout" The print function returns only the item IDs because the parameters of the interaction model are not very easy to interpret. The models are most useful when plotted: plot(m, 1, show.observed=TRUE) By default, the plot shows the regressions of the item score on the total score. The interaction model is shown with a thicker but lighter line, and the Rasch model is shown with a thinner, darker line. The unsmoothed data is not shown by default, so we had to change the show.observed parameter. The ‘curtains’ are drawn at the 5% lowest and the 5% highest sum scores. The Rasch, or, in this case, Partial Credit Model, tends to fit these items very well, so the two curves practically coincide. This need not be always the case. The interaction model is quite interesting for practice. It shares the conditioning properties of the Rasch model, which means that we can predict the expected item score at each observed total score. Hence, we have the possibility to represent the model and the observed data on the same plot, without tricks or circular reasoning. Furthermore, the interaction model reproduces the mean item scores and the correlations between the item scores and the total score: the vital classical test statistics in the table above. In other words, the interaction model represents, as far as we are concerned, the data, and the two item-total regressions capture well the systematic departures of the Rasch/PCM model from the data. What remains is predominantly random noise; if we had an item where the Rasch model does not fit (not available in this data set), we would observe that the pink dots tend to fluctuate randomly around the interaction model. We can show the response category probabilities instead of the expected item score: plot(m, 1, summate=FALSE) Now fit and display models for the sum scores on each of the four frustrating situations: mSit = fit_domains(db, 1, "situation") plot(mSit) Finally, a profile plot comparing responses by males and females on the two modes, want to curse (scold, shout), against actually curse (scold, shout): profile_plot(db, 1, 'mode', 'Gender') We would not like to see such a differential effect in a cognitive test, but with this data set the result is quite interesting. To show dexter at work with a large set of cognitive test data, we would like to analyse a PISA data set. Such a problem is probably too large for a vignette, so we will only present a possible script without actually running it. library(readr) library(SAScii) # Fetching the data from the OECD site requires a certain ... dexterity # that I picked up from a kind soul on stackexchange. I don't even know # how to credit you but you are the greatest! # Download the data dictionaly and read it in with SAScii temp = tempfile() download.file("https://www.oecd.org/pisa/pisaproducts/PISA2012_SAS_scored_cognitive_item.sas", temp) dict_scored = parse.SAScii(sas_ri = temp) unlink(temp) # Download the scored cognitive data temp = tempfile() download.file("https://www.oecd.org/pisa/pisaproducts/INT_COG12_S_DEC03.zip",temp, mode="wb") unzip(temp, "INT_COG12_S_DEC03.txt") scored = read_fwf(file = 'INT_COG12_S_DEC03.txt', col_positions = fwf_widths(dict_scored$width), progress = TRUE)
colnames(scored) = dict_scored$varname unlink(temp) # Keep only the maths items standard = 1:13 pisa12_M = subset(scored, (BOOKID %in% standard), select=c("CNT", "BOOKID", grep("PM",names(scored), value=TRUE))) rm(scored) # Items missing by design are coded with 7, and actual nonresponse with 8 # Will replace both with NA for simplicity. tempBOOKID = pisa12_M$BOOKID
pisa12_M[pisa12_M==7] = NA
pisa12_M[pisa12_M==8] = NA
pisa12_M$BOOKID = tempBOOKID rm(tempBOOKID) allNA = apply(pisa12_M, 2, function(x)all(is.na(x))) pisa12_M = pisa12_M[, !allNA] mathItems = grep("PM",names(pisa12_M), value=TRUE) # prepare the scoring rules responses = lapply(mathItems, function(x)unique(pisa12_M[,x])) foo = lapply(responses, function(x){ y=data.frame(item=names(x),response=x[,1],score=0L) names(y)[2]="response" y }) rules = Reduce(rbind, foo) rules$score[rules$response==1] = 1 rules$score[rules$response==2] = 2 rules=rules[order(rules$item, rules$response),] # create the new project db = start_new_project(rules, "PISA12.db") # add all booklets one by one, deleting columns that may be all NA d_ply(pisa12_M, ~BOOKID, function(x){ allMissing = apply(x, 2, function(x)all(is.na(x))) dat = x[,!allMissing] add_booklet(db, dat, x$BOOKID[1])
})
rm(pisa12_M)

# add some item properties -- we have supplied a data set to
# make things a bit easier
items = data.frame(item=show_items(db)$item) domain = subset( merge(items, PISA_item_class, by.x="item", by.y="ItemCode"), select = c("item", "Content") ) domain$isSaS[domain$Content=="Space and shape"] = "SaS" domain$isSaS[domain\$Content!="Space and shape"] = "notSaS"
show_item_properties(db)
show_person_properties(db)

# Fit the two models bor booklet 1, all countries (takes about
# 10 seconds on my current machine)
system.time({m <- fit_models(db, 1)})
plot(m)

# Analyse by domain
md = fit_domains(db, 1, 'Content')
plot(md, nr=2, nc=2)

# Compare three countries on 'Space and shape' vs NOT 'Shape and space'
dev.off()
profile_plot(db, 1, "isSaS", "CNT", c("DEU","FRA","ITA"))`

# References

Paul De Boeck, Mark Wilson, ed. 2004. Explanatory Item Response Models. Springer.

Vansteelandt, K. 2000. “Formal Methods for Contextualized Personality Psychology.” PhD thesis, K. U. Leuven.