Head-to-Head Importance (May 13, 2003)

A topic that seems to frequently surface when teams are compared is head-to-head. Humans tend to put a lot more emphasis on head-to-head results than do computers, and it is worth investigating which is right.

First a few words about exactly what I am testing. Sports polls are largely based on subjective judgments, and in some sense it's true that the prevailing opinion is by definition the correct one. I'm not arguing that, but I do want to examine the effects of head-to-head competition on the statistics.

I will assume that you have read the description of my Win/loss ratings, so I summarize here. For every team, there exists a likelihood of it having a rating of "r", which I define "P(r)". P(r) is a function of the wins, losses, opponent strengths, game venues, and the prior. The goal of a statistically-based win-loss rating is to find some measure of that probability distribution that can be meaningfully compared with other teams. The best such meaningful measure I can find is the probability that team A (rating=a) is superior to team B (rating=b); in other words the fraction of the area under the cumulative probability distribution P(a,b) for which a>b.

As noted in the Win/loss rating description, at some point we have to make assumptions about strengths of opponents in order to be able to reduce the calculation time for Division I football to less than a trillion years. This is what I want to test here: do these approximations somehow cause statistical computer ratings to be biased against winners of head-to-head games? The quick answer is no, and in fact that the loser of a head-to-head game is, if anything, probably better than the winner, assuming that the teams played the same schedule and had the same record. Let's take a look at why this is the case.

Test #1: 3 Teams

Here's a simple problem. Suppose you have three teams that all played each other at a neutral site, each team winning once. What are the odds that a team is better than the team it beat? At first glance, there seem two plausible answers: (a) since all three teams had one win and one loss, all are statistically equal and thus should be ranked equally; or (b) it is most likely that two of the three games were won by the better team and there was one major upset.

I'll end the suspense first, and explain below. The answer is that neither answer is correct. In fact, it is most likely that only one of the three games was won by the better team and that there were two minor upsets.

Confused? Let's look at the math. It's clear that the probability distribution P(a,b,c) is pretty simple:

   P(a,b,c) = CP(a-b) * CP(b-c) * CP(c-a),

where "a" is team A's rating, "b" is team B's rating, "c" is team C's rating, and the games happened such that A beat B, B beat C, and C beat A.

To keep this sane, let's require that the average of the three ratings is zero (a+b+c=0), which means:

   c = -a - b
   P(a,b) = CP(a-b) * CP(a+2b) * CP(-2a-b)

Of course, we don't really care about the values of a and b, just the difference between the two. Defining D=a-b and S=a+b, the equation becomes:

   P(S,D) = CP(D) * CP(3S/2-D/2) * CP(-3S/2-D/2)

One can integrate over S to calculate P(D). I show that plot below:

It is readily apparent that this curve is not symmetric around a-b=0; rather the probabilities are higher for a-b<0 (a<b). In other words, in this situation, the loser is likely better than the winner, since the two terms related to -D/2 have more influence than does the one term related to D.

While the math is indisputable, this doesn't answer the question of why this is true. To explore that, I present another plot showing P(a,b) as a function of a (on the x axis) and b (on the y axis).

In this plot, the line where a=b runs from the lower-left to the upper-right. Locations below or right of that line are where a>b, and those above or left are where a<b.

First off, let's verify what should be a common-sense feature: the most likely scenario (i.e. where P(a,b) is maximized) comes where a and b are zero (as is c). This should be the case since the set of games is symmetric and thus this is the only plausible place where such a maximum could conceivably happen. Likewise, although it isn't as obvious from the plot, the expectation value of a-b (the probability-weighted average) is zero, which again is necessary since the average case must be that all three teams have the same rating.

Now for the details. Where a is greater than b, we find that the probability distribution becomes very narrow, meaning that c (which equals -(a+b)) is very tightly constrained. This happens because of the shape of the error function CP(x): the probability is maximized when c is midway between a and b in this case, since having c much lower than a would make A's loss to C extremely unlikely, or likewise having c much higher than b would make C's loss to B extremely unlikely.

However, when A is worse than B, we are in a different scenario. In this case, A's loss to B allows the overall probability to be pretty good whenever c is anywhere between a and b, and thus you see a much broader probability distribution in the upper-left part of the figure.

As it turns out, the amount of probability above the diagonal line exceeds that below it, and thus we conclude that team B is more likely the better team (the odds are 51.2% that B is better) despite its loss to A. This is consistent with the expectation value of a-b being zero because a-b is more likely to be very high if positive, as shown on the plot by the fact that the lower-right extension goes further from the diagonal line than does the upper-left extension.

OK, now let's step back a minute and think about what this implies. Obviously I'm not saying that B is better than A, C is better than B, and A is better than C, since that would be ludicrous. The key to this dilemma is that it isn't a two-team problem, but is rather a 3-team problem. The odds of the ranking order being ABC is 15.4%, which equals the odds of the order being BCA or CAB. In contrast, the odds of having the order being CBA, ACB, or BAC is 17.9%. So the important point isn't really specifically whether A or B is the better team; it is that it is more likely that this 3-way tie was created by the better team winning only one of the three than by the better team winning three of the three. The reason for this is that two minor upsets are less unlikely than one major upset.

Test #2: 4 Teams

For a somewhat more complex (and meaningful) test, I have run the identical tests for a 4-team case in which teams A and B went 2-1, while C and D went 1-2. The sequence of games was A beats B, A beats C, B beats C, B beats D, C beats D, and D beats A. As with before, the question is whether A is likely better or worse than B. The same plots for the 3-team case are shown for this 4-team case. What is interesting about this example is that the 4-team case is very much like the case that comes up: two teams played a variety of inferior teams; the one that beat all of the inferior teams lost to the team that lost to one of the inferior teams.

As with before, it is clear that this curve is not symmetric, but that the probabilities are higher for a-b<0 (a<b). However, this case isn't exactly like what was above, and is somewhat disturbing in that sense. In the 3-team case, the probability for a-b>1 was higher than than for a-b<-1, which at least caused the expectation value to equal zero. In the four-team case, this isn't even happening: the expectation value is a-b=-0.067. Likewise, the area under the curve indicates that the probability of B being better than A is 53.5%. In fact, there is no statistical measure by which one would conclude that team A was more likely the better team.

An interesting aside. Using exactly the same calculations for the 1-2 teams, we find that team D is probably better than team C. This means that the most likely scenario is that B is the best team, followed by A, D, and C. In short, the better teams won only 3 of the 6 games in this case, showing again three minor upsets are less bad than one major upset. (Ranking the teams in order ABCD would have only implied one upset.)

First off, I should point out that since both teams went 2-1 while the other two were 1-2, it is expected that both a and b should be greater than zero while c and d are probably less than zero. As with the 3-team case, we again see that the probabilities aren't necessarily higher above the diagonal line, but the area covered by the probability distribution is greater because of weaker constraints on a+b (or equivalently on c+d). Thus the similar result.

This is the last test I will show; you will have to take my word for the fact that this is representative of most cases in which there is a question between two head-to-head teams. (If team A had beaten all of its opponents while team B went 2-1, the whole discussion would be moot since A would be by far the best team. So I'm only looking at situations in which the teams had the same record against an equally-difficult schedule.)

Summary and Discussion

There are several ways to phrase the main conclusions from this:

A bad loss is worse than a quality win.
Whom you beat doesn't matter; all that is important is how good they were.
Minor upsets are much more understandable than major upsets.

However, I think the most succinct summary is simply "head-to-head isn't important".
This runs counter to most people's logic, as typical voters will consider head-to-head as one of the most important pieces of data. There are a few ways to look at it: (a) voters have a soft spot for seeing things settled "on the field"; (b) voters underestimate random effects in games and thus erroneously conclude that the winner must be the better team, or (c) the human brain cannot possibly handle all the data required to make an accurate rating and thus looks for shortcuts such as head-to-head. I won't make judgments as to which it is, although the answer to this question is very important in terms of determining the amount that polls and committees can be trusted to rank teams in the best order. If it is (a), one would conclude that human ratings are superior to computer ratings because they take into account important details overlooked by the computer. If (b) or (c), the computer ratings are superior because the human ranking is based on irrational means.

Return to ratings main page

Note: if you use any of the facts, equations, or mathematical principles introduced here, you must give me credit.