Poll-like Ratings

My statistical (predictive) rating answers the following question: based on available data, how likely is team A to beat team B? A rating is calculated based on these probabilities and pure statistics, and teams are ranked.

Humans, however, evaluate things much differently. My ratings provide two "human" ranking systems for college sports -- a pseudopoll and a pairwise ranking. The pseudopoll attempts to mimic poll voting tendencies; the pairwise ranking (for baseball and basketball) attempts to mimic tournament selection and seeding tendencies. These are similar in many ways but different in others. Both systems, however, deviate strongly from computer rating systems. Generally this is cause for suspicion: the voters think the programmers are wrong while the programmers say the voters don't know how to rank teams. There are truths to both sides -- there are random factors (from human indecision) and systematic factors (such as poll inertia) that make polls unfair; on the other hand a measurement of the "best" season is inherently subjective and thus one must examine how people (on average) make such judgments. This page describes how human evaluations systematically differ from computer evaluations.

Note that all information on this page was obtained through exhaustive (and exhausting) research and experimentation; any use of this information for any purpose must give me credit.

Contents of this page:

Initial Feelings
Adjustments
Pseudopoll and Pairwise Ranking

Initial Feelings

If an average individual is asked to rank teams in some order, he will go through a two-step process. In some cases this process is intuitive and "seems" like only one step to the individual; in other cases the person will actually do both steps. Step one is to make a provisional ranking based on one's "gut instinct" feelings about each team. Step two is to adjust the ranking as necessary, based on specific data. The information presented on this page is based on attempts to mimic poll and selection results by mimicking this two-step process. This section deals with the first step.

Poll voters seem to come up with their initial feelings based on the team's record, soundness of wins, and the quality of the better teams it has beaten.

The record is straightforward to quantify; I use the percentage of games won to accomplish this. While this seems to give football teams at 11-1 an advantage over those 10-1, my comparisons of simulated and real polls indicate that 11-1 teams do tend to be ranked higher (all other things being equal) and thus the winning percentage is indeed the correct measurement.

For the average margin of victory, I use the game probability factors described in the predictive poll description. In short, the value produced for each game is the probability that the team outplayed its opponent in that game. A tie game gives a probability of 0.5; a blowout win gives a value of nearly 1, and a blowout loss gives a value of nearly 0.

The final element of the voter's initial feeling is the quality of teams it has beaten. Interestingly, a team that goes 9-2 with two losses to top-10 teams seems to be ranked no better than one whose two losses are to lousy teams; only the wins really matter here. I quantify an opponent's quality as its probability of outplaying the 25th-best team in the same road/home/neutral situation as the game in question. The value that best replicates polls is not the simple average of such probabilities, but rather weighted towards a team's stronger opponents. Thus a game against a very weak opponent does not appear to lessen a voter's impression of a team's schedule, as long as there are a couple wins over ranked opponents.

Although the three components come out of the predictive rating system, they are not combined in a similar fashion at all (no surprise, given that voters are not computers). Instead, the voter appears to start with the team's record and make adjustments according to the "convincingness" of its wins and whether or not it has beaten solid opponents.

A comparison of typical voting patterns with statistics shows two very interesting points:

Voters give much more importance to a team's record than to the quality of its wins and losses. This should not be a surprise, given that computer ratings that do not consider margin of victory mimic polls much more closely than those that do. Voters don't attach "enough" weight to the quality of a team's opponents. Again, no surprise given that football polls nearly always overrank minor conference undefeateds (compared to computer polls). This finding is rather disturbing -- a team that racks up wins against weak opponents will be ranked higher than an identical team that gets wins and losses against a tougher schedule. Football people have instinctively known this for a long time; there is a reason no sane athletic director will schedule three top-notch opponents for its non-conference games. Instead, they attempt to play the minimum number of quality opponents needed to prove their team in the eyes of voters, thus giving their team a chance to come out on top of other teams with the same record.

Obviously no voter puts his voting tendencies in a computer and follows these steps to determine his instinct. However, his impression of a team is based on his impression of the team's performances in various games, with some games seen as more important than others. So, in terms of understanding polls, this is a fair process.

In NCAA selection committees, the "gut feeling" ranking is provided for the committee in the form of RPI rankings. While many committee members downplay the importance of RPIs in the selection process, it is clear from my research that the RPI alone provides the initial ordering of teams being considered. The exception is the men's basketball committee, which also receives daily Sagarin ratings. They appear to give equal weight to the Sagarin and RPI rankings.

The initial ordering of teams from the RPI appears to have a couple of significant modifications. A team finishing below 0.500 in conference play is hurt (so that a basketball team going 7-9 would effectively lose 0.05 RPI points and one going 6-10 would lose 0.10 and essentially be eliminated from consideration). A team with a solid record in a tough conference is helped (a 20-10 team in a power conference would effectively gain 0.06 RPI points); sadly the team's non-conference schedule strength doesn't appear to matter so that a team that went 12-0 against lousy non-conference opponents and 9-9 in conference would be helped as much as one that went 6-6 against top-notch non-conference opponents and 14-6 in conference play.

The NCAA championship handbook lists quite a few other factors that are supposedly considered by the committees:

conference and non-conference RPI
record in non-conference, road, and last 10 games
record against automatic qualifiers

It isn't clear that these are used at all. Most likely, they serve as tiebreakers should two teams be seen virtually identically.

Adjustments

Once an individual ranks teams according to his initial feelings (as described in the previous section), he will make a few "common sense" adjustments. While most individuals seem to have very similar initial feelings, it seems that the adjustments will vary wildly from person to person. I have found four types of adjustments that tend to be made. All four are present in voted polls; only two are in the selection results.

Head-to-head play (both). An important consideration in all poll voting is the desire to rank teams ahead of teams they beat. A team that goes 2-0 against another be more likely to have the rankings altered to accomodate it than would one that went 1-0, 2-1, or 4-2. This is not consistently applied in selections, however, which tend to adopt the "big picture" view seen in computer ratings (no game more important than any other).
Common opponents (polls only). Poll voters, especially football voters, consider performance against common opponents (conference and non-conference opponents). Conference records naturally fall into this, since common opponents of teams in the same conference are almost entirely other teams in that conference.
Conference records (selection). A feature of selections is the tendency to select and seed same-conference teams in order of conference record. Selection committees tend to do this blindly; a weakness in the selection process when dealing with conferences that don't use a straight round-robin format such as the large basketball conferences.)
Inertia (polls only). It's clear from watching the polls that teams tend to keep the same relative ranking until they lose, in which case they will be re-ranked. Again, this is a factor that will vary widely from voter to voter; some will be more or less likely than others to reorder teams that both won if the lower-ranked team performed better. This is the only feature I'd consider an outright negative; it's obvious that the polls would be fairer if this factor were eliminated.

Note that all of these factors signify leanings. Each voter also has an opinion regarding his initial ballot, meaning that the need for reranking according to these principles has to be sufficiently strong to overcome his initial feeling.

Pseudopoll and Pairwise Ranking

The pseudopoll is a simulated poll, with 50 "virtual" voters selected. The average voter profile is as described above, but the 50 voters themselves each have their own personal leanings on each issue, which they use to produce their own ballot. Note that ballots are not saved from poll to poll, meaning that there is no "inertia" in the pseudopoll (aside from the fact that whatever caused a team to be ranked high in the previous poll will cause them to be ranked high in the current one). In most sports, each ballot consists of 25 teams, with a first place vote worth 25 points, second worth 24, and so on. In hockey, the ballot only includes 10 teams, with points 10-9-8-...

As mentioned in the predictive rating description, the pseudopoll includes priors to improve early season stability. This is a factor based on preseason guesses of team strength that is incorporated into the team performance calculation but not into the head-to-head or common opponent modifiers.

The pairwise ranking is based on a single "ballot" using the average selection profile for the sport in question.

Return to ratings main page

Note: if you use any of the facts, equations, or mathematical principles on this page, you must give me credit.