Ladies’ figure skating in the 2002 Winter Olympics

This script creates and analyses hyper2 object “skating” which is a likelihood function for the strengths of the competitors in the Ladies’ Figure Skating at the 2002 Winter Olympics. Dataframe skating_table is copied from Lock and Lock (see skating.Rd for more details).

library("hyper2",quietly=TRUE)
data("skating")
skating_table
##             J1 J2 J3 J4 J5 J6 J7 J8 J9
## hughes       1  4  3  4  1  2  1  1  1
## slutskaya    3  1  1  1  4  1  2  3  2
## kwan         2  3  2  2  2  3  3  2  3
## cohen        5  2  4  3  3  4  4  4  4
## suguri       4  8  5  5  5  7  5  5  5
## butyrskaya   6  5  8  7 12  5  8  7  6
## robinson     7  7  7  9  6  8 10  6  7
## sebestyen    8 10 12  8  7  6 12  8  8
## kettunen     9  9 13  6 12 10  7 11 14
## volchkova   10  6 14 11 10 12  6  9 15
## maniachenko 13 12 11 12 16 11 11 10  9
## fontana     14 11 18 16  9 15  9 12 10
## liashenko   15 13  6 10  8 14 13 14 16
## onda        11 14 10 15 15 13 15 13 11
## hubert      12 17 17 13 11 16 14 15 13
## meier       16 16  9 14 14  9 16 16 12
## gusmeroli   17 15 15 17 17 18 17 17 17
## soldatova   19 18 22 20 21 17 18 18 19
## hegel       20 21 16 22 18 19 21 19 18
## giunchi     18 19 20 21 19 20 20 20 20
## babiakova   22 20 19 19 20 21 19 22 22
## kopac       21 22 23 18 22 22 22 21 21
## luca        23 23 21 23 23 23 23 23 23

Object skating_table is structured so that each competitor is a row, and each judge is a column. Function order_likelihood() considers each row to be a race [or a judge], so we need to take a transpose. The following R idiom is plausible but incorrect:

skating_incorrect <- order_likelihood(t(skating_table))

The above is incorrect because we want the ranks, and we have the order. Correct analysis follows.

skating_rank <- apply(skating_table,2,order)
head(skating_rank)
##      J1 J2 J3 J4 J5 J6 J7 J8 J9
## [1,]  1  2  2  2  1  2  1  1  1
## [2,]  3  4  3  3  3  1  2  3  2
## [3,]  2  3  1  4  4  3  3  2  3
## [4,]  5  1  4  1  2  4  4  4  4
## [5,]  4  6  5  5  5  6  5  5  5
## [6,]  6 10 13  9  7  8 10  7  6

Column 1 of skating_table and column 1 of skating_rank are confusingly simular. So consider column 2 instead, corresponding to judge number 2. If a=skating_table[,2] and b=skating_rank[,2] we have a=c(4,1,3,2, ...) and b=c(2,4,3,1, ...). Thus a means that hughes was judged to be fourth best, slutskaya was judged to be the best, kwan third best, cohen second best, and so on. Conversely, b means that the best competitor was number two [slutskaya], the second best number was number four [cohen], the third best was number three [kwan], the fourth best number 1 [hughes], and so on. Note that this means the rows of skating_rank cannot be named.

skating <- order_likelihood(t(skating_rank))
pnames(skating) <- rownames(skating_table)
head(skating)
## hughes^9 * (hughes + slutskaya + kwan + cohen + suguri +
## butyrskaya + robinson + sebestyen + kettunen + volchkova +
## maniachenko + fontana + liashenko + onda + hubert + meier +
## gusmeroli + soldatova + hegel + giunchi + babiakova + kopac +
## luca)^-9 * (hughes + kwan + cohen + suguri + butyrskaya + robinson
## + sebestyen + kettunen + volchkova + maniachenko + fontana +
## liashenko + onda + hubert + meier + gusmeroli + soldatova + hegel
## + giunchi + babiakova + kopac + luca)^-4 * (hughes + kwan + suguri
## + butyrskaya + robinson + sebestyen + kettunen + volchkova +
## maniachenko + fontana + liashenko + onda + hubert + meier +
## gusmeroli + soldatova + hegel + giunchi + babiakova + kopac +
## luca)^-1 * (hughes + cohen + suguri + butyrskaya + robinson +
## sebestyen + kettunen + volchkova + maniachenko + fontana +
## liashenko + onda + hubert + meier + gusmeroli + soldatova + hegel
## + giunchi + babiakova + kopac + luca)^-2 * (hughes + suguri +
## butyrskaya + robinson + sebestyen + kettunen + volchkova +
## maniachenko + fontana + liashenko + onda + hubert + meier +
## gusmeroli + soldatova + hegel + giunchi + babiakova + kopac +
## luca)^-2

Now some data visualization. First the MLE for the strengths:

dotchart(maxp(skating))

dotchart(log(maxp(skating)))

Now compare my rating with the official point-tallying method:

par(pty='s')
lik <- order(maxp(skating),decreasing=TRUE)
plot(1:23,lik,asp=1,pch=16,xlab='official order',ylab='likelihood order')
text(1:23,lik,asp=1,pch=16,pos=c(rep(4,19),rep(2,4)),rownames(skating_table))
abline(0,1)

Evidence for medal positions

Looking at the figure, it seems that the medallists—Hughes (gold), Slutskya (silver), Kwan (bronze)—were considerably higher in strength than the rest of the field. Here I will test the hypothesis that the medallists were in fact the strongest three competitors. Technically you need to optimize over the union of the possibilities that one of the three medallists did not come in the top three; but this is hard. We will do something much easier but numerically equivalent: optimize over the union of outcomes where either Cohen or Suguri (who placed fourth and fifth respectively) had higher strength than any of the medallists.

jj <-
    matrix(c(
        -1, 0, 0,   1,0,  # Hughes   < Cohen
         0,-1, 0,   1,0,  # Slutskya < Cohen
         0, 0,-1,   1,0,  # Kwan     < Cohen
        
        -1, 0, 0,   0,1,  # Hughes   < Suguri
         0,-1, 0,   0,1,  # Slutskya < Suguri
         0, 0,-1,   0,1), # Kwan     < Suguri
        byrow=TRUE,ncol=5)

problem_constraints <-  # fill with zeros for other competitors
    cbind(jj,matrix(0,nrow(jj),size(skating)-ncol(jj)-1))

small <-1e-3  # need a sensible start value satisfying the constraints
start <- c(rep(2*small,3),rep(3*small,2),rep(small,17))

out <- rep(0,nrow(problem_constraints))
fullout <- list()

for(i in seq_len(nrow(problem_constraints))){
    jj <- maxp(skating, startp=start, give=TRUE,fcm=problem_constraints[i,], fcv=0)
   fullout[[i]] <- jj
   out[i] <- jj$value
}
out
## [1] -237.8304 -235.5829 -231.3575 -240.6571 -241.8991 -243.1146
maxp(skating,give=TRUE)$value  - max(out)
## [1] 1.026484

that is, about 1 unit of support, falling short of the two units suggested by Edwards. Observe that the maximum likelihood among the six alternative hypotheses is that of number 3, in which the maximization was constrained to obey Kwan < Cohen.

If, instead, we ask whether there is evidence that Suguri should not have been a medallist, we find

maxp(skating,give=TRUE)$value  - max(out[4:5])
## [1] 10.32606

There is something odd going on:

plot(out,ylab="log likelihood",axes=FALSE)
axis(2)
axis(side=1,at=1:6,srt=45,labels=c(
"Hug<Coh", "Slu<Coh", "Kwa<Coh",
"Hug<Sug", "Slu<Sug", "Kwa<Sug"
))

In the plot above, the vertical axis shows the support. The six points on the x-axis correspond to the six rows of problem_constraints; names have been abbreviated to the first three letters. Thus the first three points are maximum likelihoods for Hughes < Cohen, Slutskya < Cohen, and Kwan < Cohen respectively. See how the likelihoods increase from the first to the third point; this indicates that Hughes < Cohen, is the least likely option, and Kwan < Cohen the most likely option. This makes sense because Cohen is the weakest of the three medallists. Points 4-6 are in the reverse order, and I have no explanation for this. The likelihoods indicate that, if Suguri is more likely to be stronger than Hughes than to be stronger than Kwan, which is difficult to understand because Hughes has a higher estimated strength than Kwan.