Plausible Values in Dexter

Timo Bechger and Ivailo Partchev


Plausible values are random samples from the posterior distribution of ability. Dexter produces plausible values using a straightforward rejection algorithm which was suggested by Rubin (1984), described and improved by Maarten Marsman et al. (2017), and applied in the SurveyLang project (

To see how the algorithm works, one should note that the posterior of ability can be defined as the collection of all values of ability that might have generated the observations. This simple observation suggests the following algorithm: sample a tentative ability from some distribution, then simulate response data from the tentative ability and the item parameters. Since the posterior depends only on the test score, we repeat this until we hit an ability that produces the same expected test score as the one observed for the person. This becomes the plausible value for the person.

In the simple case where the test consists of 20 Rasch items, the following algorithm would produce one plausible value for a respondent with a test score of 10 correct answers out of 20.

x_plus = 10
delta = runif(20, -2, 2)
  abil = rnorm(1, 0, 1)
  y_plus = 0
  for (i in 1:20) y_plus = y_plus + 1*(rlogis(1, 0, 1) <= (abil-delta[i]))
  if (x_plus == y_plus) break
pv = abil

The algorithm used by dexter is only slightly more complicated: it is written in C for speed, and adapted to handle polytomous items. The user-level function is called plausible_values, and takes as arguments the data source (data base connection or data frame), the items parameters created by function fit_enorm, and an optional predicate to select a subset of the data. By judicious use of the predicate the user can do bizarre things, such as estimate item parameters from one subset of the data and apply them to compute plausible values for another subset.

Plausible values have been around for quite some time, and have been made popular by many large scale educational surveys. Still, their full potential is possibly underestimated. Having a reasonably fast program such as dexter allows us to play around and demonstrate both the power and the simplicity of the method.

As a start, here is a simple function to simulate a matrix of response data conforming to the Rasch model, in the long shape expected by dexter:

sim_Rasch = function(theta, delta) {
  n = length(theta)
  m = length(delta)
    person_id = rep(paste0('p',1:n), m),
    item_id = rep(paste0('i',1:m), each=n),
    item_score = as.integer(rlogis(n*m, outer(theta, delta, "-")) > 0)

##    person_id item_id item_score
## 1         p1      i1          0
## 2         p2      i1          0
## 3         p3      i1          0
## 4         p4      i1          1
## 5         p5      i1          1
## 6         p1      i2          0
## 7         p2      i2          1
## 8         p3      i2          0
## 9         p4      i2          1
## 10        p5      i2          0
## 11        p1      i3          0
## 12        p2      i3          1
## 13        p3      i3          1
## 14        p4      i3          1
## 15        p5      i3          0

Next, a function to simulate a data set, estimate the item parameters, draw 10 plausible values for each ‘person’, and compare their empirical cumulative distribution function (ECDF, shown in gray) with that of the true abilities (the black line):

sim_PV = function(theta, delta) {
  simulated_data = sim_Rasch(theta, delta) 
  parms = fit_enorm(simulated_data)
  plot(ecdf(theta-mean(theta)), main=paste(length(delta), "items"))
  pv = plausible_values(simulated_data, parms, nPV = 10)
  for (i in 1:10) lines(ecdf(pv[[i+3]]-mean(pv[[i+3]])), col=rgb(.7,.7,.7,.5))

Let us try it out:

theta = rnorm(300)
delta = runif(50, -3, 1)
sim_PV(theta, delta)

So far, so good. The true abilities were sampled from the standard normal distribution, which was also used as the default prior in the plausible_values function. To make things a bit more exciting, we will generate true abilities from a markedly bi-modal distribution whilst keeping the standard normal prior, which will be obviously misspecified in this case. We will then examine the plausible value from tests using the first 10, the first 20, and all 50 items simulated above.

grp = sample(2, 300, replace = TRUE, prob = c(.5,.5))
theta = rnorm(300, mean = c(-2,2)[grp], sd = c(1,1)[grp])

sim_PV(theta, delta[1:10])
sim_PV(theta, delta[1:20])
sim_PV(theta, delta)


The remarkable thing about plausible values is that they will eventually reproduce the true distribution of ability: perhaps sooner, if the prior is specified correctly, or later, if it is not. The price that we had to pay in this example was merely administer a longer test. For a more detailed discussion of this valuable property we refer to M. Marsman et al. (2016).

Observe that the gray bands are not wide, suggesting little uncertainty in determining the ECDF of ability. It may be more realistic to take the uncertainty in the item parameters into account. To do this, we will use a Gibbs sampler to draw 50 samples from the posterior distribution of the item parameters, and we will pick one of these at random each time we produce a plausible value.

First, a little function to accomplish this; note the use of the use_draw option.

sim_PV2 = function(theta, delta) {
  simulated_data = sim_Rasch(theta, delta) 
  parms = fit_enorm(simulated_data, method="Bayes", nIterations = 50)
  plot(ecdf(theta-mean(theta)), main=paste(length(delta), "items"))
  for (iter in 1:10) {
    indx = sample(1:50, 1)
    pv = plausible_values(simulated_data, parms, use_draw=indx)
    lines(ecdf(pv$PV1-mean(pv$PV1)), col=rgb(.7,.7,.7,.5))

Now, try it out:

sim_PV2(theta, delta[1:10])
sim_PV2(theta, delta[1:20])
sim_PV2(theta, delta)


The results show a small increase in variance but, with 300 persons, the item parameters have been estimated very precisely.


Marsman, M., G. Maris, T. M. Bechger, and C. A. W. Glas. 2016. “What can we learn from plausible values?” Psychometrika, 1–16.

Marsman, Maarten, Gunter Maris, Timo Bechger, and Cees Glas. 2017. “Turning Simulation into Estimation: Generalized Exchange Algorithms for Exponential Family Models.” PloS One 12 (1). Public Library of Science: e0169787.

Rubin, Donald B. 1984. “Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician.” The Annals of Statistics 12 (4). Institute of Mathematical Statistics: 1151–72.