This script creates and analyses hyper2 object “skating” which is a likelihood function for the strengths of the competitors in the Ladies’ Figure Skating at the 2002 Winter Olympics. Dataframe skating_table
is copied from Lock and Lock (see skating.Rd for more details).
library("hyper2",quietly=TRUE)
data("skating")
skating_table
## J1 J2 J3 J4 J5 J6 J7 J8 J9
## hughes 1 4 3 4 1 2 1 1 1
## slutskaya 3 1 1 1 4 1 2 3 2
## kwan 2 3 2 2 2 3 3 2 3
## cohen 5 2 4 3 3 4 4 4 4
## suguri 4 8 5 5 5 7 5 5 5
## butyrskaya 6 5 8 7 12 5 8 7 6
## robinson 7 7 7 9 6 8 10 6 7
## sebestyen 8 10 12 8 7 6 12 8 8
## kettunen 9 9 13 6 12 10 7 11 14
## volchkova 10 6 14 11 10 12 6 9 15
## maniachenko 13 12 11 12 16 11 11 10 9
## fontana 14 11 18 16 9 15 9 12 10
## liashenko 15 13 6 10 8 14 13 14 16
## onda 11 14 10 15 15 13 15 13 11
## hubert 12 17 17 13 11 16 14 15 13
## meier 16 16 9 14 14 9 16 16 12
## gusmeroli 17 15 15 17 17 18 17 17 17
## soldatova 19 18 22 20 21 17 18 18 19
## hegel 20 21 16 22 18 19 21 19 18
## giunchi 18 19 20 21 19 20 20 20 20
## babiakova 22 20 19 19 20 21 19 22 22
## kopac 21 22 23 18 22 22 22 21 21
## luca 23 23 21 23 23 23 23 23 23
Object skating_table
is structured so that each competitor is a row, and each judge is a column. Function order_likelihood()
considers each row to be a race [or a judge], so we need to take a transpose. The following R idiom is plausible but incorrect:
skating_incorrect <- order_likelihood(t(skating_table))
The above is incorrect because we want the ranks, and we have the order. Correct analysis follows.
skating_rank <- apply(skating_table,2,order)
head(skating_rank)
## J1 J2 J3 J4 J5 J6 J7 J8 J9
## [1,] 1 2 2 2 1 2 1 1 1
## [2,] 3 4 3 3 3 1 2 3 2
## [3,] 2 3 1 4 4 3 3 2 3
## [4,] 5 1 4 1 2 4 4 4 4
## [5,] 4 6 5 5 5 6 5 5 5
## [6,] 6 10 13 9 7 8 10 7 6
Column 1 of skating_table
and column 1 of skating_rank
are confusingly simular. So consider column 2 instead, corresponding to judge number 2. If a=skating_table[,2]
and b=skating_rank[,2]
we have a=c(4,1,3,2, ...)
and b=c(2,4,3,1, ...)
. Thus a
means that hughes was judged to be fourth best, slutskaya was judged to be the best, kwan third best, cohen second best, and so on. Conversely, b
means that the best competitor was number two [slutskaya], the second best number was number four [cohen], the third best was number three [kwan], the fourth best number 1 [hughes], and so on. Note that this means the rows of skating_rank
cannot be named.
skating <- order_likelihood(t(skating_rank))
pnames(skating) <- rownames(skating_table)
head(skating)
## hughes^9 * (hughes + slutskaya + kwan + cohen + suguri +
## butyrskaya + robinson + sebestyen + kettunen + volchkova +
## maniachenko + fontana + liashenko + onda + hubert + meier +
## gusmeroli + soldatova + hegel + giunchi + babiakova + kopac +
## luca)^-9 * (hughes + kwan + cohen + suguri + butyrskaya + robinson
## + sebestyen + kettunen + volchkova + maniachenko + fontana +
## liashenko + onda + hubert + meier + gusmeroli + soldatova + hegel
## + giunchi + babiakova + kopac + luca)^-4 * (hughes + kwan + suguri
## + butyrskaya + robinson + sebestyen + kettunen + volchkova +
## maniachenko + fontana + liashenko + onda + hubert + meier +
## gusmeroli + soldatova + hegel + giunchi + babiakova + kopac +
## luca)^-1 * (hughes + cohen + suguri + butyrskaya + robinson +
## sebestyen + kettunen + volchkova + maniachenko + fontana +
## liashenko + onda + hubert + meier + gusmeroli + soldatova + hegel
## + giunchi + babiakova + kopac + luca)^-2 * (hughes + suguri +
## butyrskaya + robinson + sebestyen + kettunen + volchkova +
## maniachenko + fontana + liashenko + onda + hubert + meier +
## gusmeroli + soldatova + hegel + giunchi + babiakova + kopac +
## luca)^-2
Now some data visualization. First the MLE for the strengths:
dotchart(maxp(skating))
dotchart(log(maxp(skating)))
Now compare my rating with the official point-tallying method:
par(pty='s')
lik <- order(maxp(skating),decreasing=TRUE)
plot(1:23,lik,asp=1,pch=16,xlab='official order',ylab='likelihood order')
text(1:23,lik,asp=1,pch=16,pos=c(rep(4,19),rep(2,4)),rownames(skating_table))
abline(0,1)
Looking at the figure, it seems that the medallists—Hughes (gold), Slutskya (silver), Kwan (bronze)—were considerably higher in strength than the rest of the field. Here I will test the hypothesis that the medallists were in fact the strongest three competitors. Technically you need to optimize over the union of the possibilities that one of the three medallists did not come in the top three; but this is hard. We will do something much easier but numerically equivalent: optimize over the union of outcomes where either Cohen or Suguri (who placed fourth and fifth respectively) had higher strength than any of the medallists.
jj <-
matrix(c(
-1, 0, 0, 1,0, # Hughes < Cohen
0,-1, 0, 1,0, # Slutskya < Cohen
0, 0,-1, 1,0, # Kwan < Cohen
-1, 0, 0, 0,1, # Hughes < Suguri
0,-1, 0, 0,1, # Slutskya < Suguri
0, 0,-1, 0,1), # Kwan < Suguri
byrow=TRUE,ncol=5)
problem_constraints <- # fill with zeros for other competitors
cbind(jj,matrix(0,nrow(jj),size(skating)-ncol(jj)-1))
small <-1e-3 # need a sensible start value satisfying the constraints
start <- c(rep(2*small,3),rep(3*small,2),rep(small,17))
out <- rep(0,nrow(problem_constraints))
fullout <- list()
for(i in seq_len(nrow(problem_constraints))){
jj <- maxp(skating, startp=start, give=TRUE,fcm=problem_constraints[i,], fcv=0)
fullout[[i]] <- jj
out[i] <- jj$value
}
out
## [1] -237.8304 -235.5829 -231.3575 -240.6571 -241.8991 -243.1146
maxp(skating,give=TRUE)$value - max(out)
## [1] 1.026484
that is, about 1 unit of support, falling short of the two units suggested by Edwards. Observe that the maximum likelihood among the six alternative hypotheses is that of number 3, in which the maximization was constrained to obey Kwan < Cohen
.
If, instead, we ask whether there is evidence that Suguri should not have been a medallist, we find
maxp(skating,give=TRUE)$value - max(out[4:5])
## [1] 10.32606
There is something odd going on:
plot(out,ylab="log likelihood",axes=FALSE)
axis(2)
axis(side=1,at=1:6,srt=45,labels=c(
"Hug<Coh", "Slu<Coh", "Kwa<Coh",
"Hug<Sug", "Slu<Sug", "Kwa<Sug"
))
In the plot above, the vertical axis shows the support. The six points on the x-axis correspond to the six rows of problem_constraints
; names have been abbreviated to the first three letters. Thus the first three points are maximum likelihoods for Hughes < Cohen
, Slutskya < Cohen
, and Kwan < Cohen
respectively. See how the likelihoods increase from the first to the third point; this indicates that Hughes < Cohen
, is the least likely option, and Kwan < Cohen
the most likely option. This makes sense because Cohen is the weakest of the three medallists. Points 4-6 are in the reverse order, and I have no explanation for this. The likelihoods indicate that, if Suguri is more likely to be stronger than Hughes than to be stronger than Kwan, which is difficult to understand because Hughes has a higher estimated strength than Kwan.