Motivation
After the first weekend of the 2005 NCAA tournament, I asked myself... "Who could've predicted Bucknell over Kansas?". Kansas had the highest RPI and SOS; however it was 5-5 over the last ten games and had a coach in his 2nd year. Bucknell was ranked 64 in the RPI and had just won their conference tournament. What determines early exits and late runs in the tournament?
The idea
Using facts/statistics like the ones mentioned above, an equation can be found that can predict the successfulness of a basketball team in the NCAA tournament.
Does it work?
Shown below is a ranking of the 64 teams in the 2006 NCAA tournament, which was found by using an equation.
167.72 Connecticut
165.69 Kansas
154.71 Texas
153.41 Memphis
148.03 Duke
147.20 Gonzaga
144.24 North Carolina
144.15 Nevada
142.42 Villanova
139.11 Louisiana St.
138.14 Ohio St.
136.80 UCLA
135.89 Florida
134.84 Bucknell
134.27 Iowa
133.21 Pittsburgh
131.41 George Mason
130.54 Boston College
127.49 Arkansas
126.00 Illinois
124.43 George Washington
124.02 NC Wilmington
123.79 Washington
122.22 Texas A&M
121.25 Wichita St.
118.91 Pacific
116.26 Bradley
115.89 Syracuse
113.87 UAB
113.83 Alabama
111.78 Georgetown
110.78 Utah St.
109.95 San Diego St.
109.51 Xavier
105.53 Arizona
104.90 Tennessee
104.79 California
103.50 Kentucky
102.39 Marquette
102.36 Oklahoma
100.74 Air Force
100.53 Wisconsin
99.79 North Carolina St.
99.54 Kent St.
99.13 Winthrop
96.04 Southern Illinois
89.64 West Virginia
89.09 South Alabama
88.74 Michigan St.
86.09 Seton Hall
85.67 Indiana
84.04 Northwestern St.
83.26 Murray St.
82.27 Montana
81.76 Monmouth
79.26 Pennsylvania
78.32 Iona
76.29 Southern
76.05 Wisconsin Milwaukee
75.62 Northern Iowa
69.07 Davidson
68.89 Belmont
63.58 Albany
62.23 Oral Roberts
Man versus machine
The question soon arises: can a computer program make better picks than your average sports fan? After watching months of games and pouring over stats for days, I made my predictions. Using the ranking system, a bracket was filled out. A summary of the rounds is shown below.
Man
Machine
Rnd 1
21/32
24/32
Rnd 2
9/16
10/16
Rnd 3
2/8
2/8
Rnd 4
0/4
0/4
Rnd 5
0/2
0/2
Rnd 6
0/1
0/1
How does it work?
The idea is actually quite simple:
- Gather past statistics and facts on each of the teams.
- Data from 2004 and 2005 tournament teams was collected.
- 26 pieces of info was used to describe each team.
- Fit an equation to the outcome of past tournament results.
- Using the stats from '04 and '05, an equation was formulated that related the statistical parameters to the '04 and '05 tourney results.
- Because the problem is rather large and complex, a genetic algorithm was used (traditional optimization methods/packages failed).
- Use the equation to predict the outcome of the current tournament.
- By applying 2006 data to this equation, a singular value for each team was found.
- Teams with a high ranking should advance further than those with a low ranking.
Final thoughts
There is obviously some promise in the current formulation. The equation correctly predicted several major upsets:
- (12) Texas A&M over (5) Syracuse
- (11) George Mason over (6) Michigan State
- (7) Wichita State over (2) Tennessee
Unfortunately it also missed big on a few teams:
- (4) Kansas in the final four who was upset by (13) Bradley
- (5) Nevada in the final four who was upset by (12) Montana
Future work
There is plenty of work to do, mainly, add more stats and explore ways of maximizing the accuracy and/or minimizing the error. When time permits, I'll try to provide more detail on the methodology of the current program.
Acknowledgments
Many of the statistics used to formulate the ranking system were acquired from Ken Pomeroy.
Disclaimer
The NCAA opposes all sports wagering. The tournament should not be used for sweepstakes, contests, office pools, or other gambling activities. The ranking system shown on this website is purely for entertainment purposes.