## Betting on Baseball

/ by /

Model correctly predicts Cy Young Award winners

Bartolo Colon and Chris Carpenter are not the best pitchers in baseball. You know it, your dad knows it and over 1,300 diehard fans who vote on the Internet Baseball Awards know it. But somehow, on the week of November 7th, they were awarded Cy Young Awards in the American League and National League, respectively, naming them the best pitchers for 2005.

Two Rhode Island College mathematicians knew it would happen.

In the April 2005 issue of Math Horizons, husband and wife team Rebecca Sparks and David Abrahamson describe a mathematical model they created to predict the winners of the Cy Young Award; their algorithm put Colon and Carpenter at the top.

The researchers used linear programming to analyze data from the winners and first two runners-up for the award from 1993 to 2002. Sparks and Abrahamson aimed to turn the data into an equation that would accurately depict the first, second and third place winners from past years.

“We took information from starting pitchers: strikeouts, earned run average, team winning percentage, wins and losses,” Sparks said. “We wanted to find a way we could do a weighted average of those statistics and find what those weights are.”

Sparks said they found out early in the process that they would only be able to analyze data for starting pitchers. They used a fairly basic linear system: The values for each of the five factors were mapped linearly onto a scale from one to 10. Computer software, programmed to solve linear problems, found coefficients for these numbers that gave the first place winner the highest number and the third place winner the lowest number for each year studied. A “starting or relief” factor did not fit into this method.

“Relief pitchers showed up so rarely in the voting and their statistics are so much different from starting pitchers’ that it was just throwing everything off,” Sparks said. “Once we took them out, that’s when we were seeing the model start to work.”

By finding weighting coefficients for each of those five factors, Sparks and Abrahamson effectively discerned how important each component is to the Baseball Writers Association of America, the body that determines the winner. The factor that mattered the most to the voters—accounting for 60% of the decision—was wins, the number of times in the season that the pitcher was credited with the team’s victory.

Sparks and Abrahamson’s system was validated by the selection of their two list-toppers: Colon and Carpenter. But, are wins the best indicator of pitching performance? And how good is the Baseball Writers Association of America at pegging down the best pitchers in the league?

“They’re absolutely horrendously bad,” said Andy Andres, a natural science professor at Boston University. Andres and Tufts Public Health Professor David Tybor teach a course at Tufts on sabermetrics, the objective analysis of baseball through statistics.

“Wins are so far down the list of anything that predicts their ability,” said Andres.

Tybor agrees with his colleague. “For a pitcher to get a win, a lot has to go right—his team scores runs for him, he pitches well, his bullpen throws well, the manager manages well and the defense defends well,” he said. “Baseball history is littered with pitchers whose win-loss records belie their pitching prowess,” he said in an email conversation.

However, Andres doesn’t doubt Sparks and Abrahamson’s analysis.

“Everybody knows that wins are paramount for Cy Young winners,” he said. “So I agree with their analysis that it’s probably about 60 percent; voters do absolutely overemphasize wins.”

“ERA is actually a pretty good measure of pitching quality,” Tybor said. ERA, the average number of runs a pitcher allows per nine innings, showed up as the second most prominent factor in the judges’ decision, accounting for about 20% of the verdict.

“We have our students go a step further and place ERA in context of the pitcher’s league, park, and era,” said Tybor. “With some basic statistical tools, our students show that Pedro Martinez’s 1.74 ERA in 2000 was much more impressive than Bob Gibson’s infamous 1.12 in 1968.”

“Just don’t tell Gibson.”

Originally published November 10, 2005

## Now on SEEDMAGAZINE.COM

• ### Ideas

#### I Tried Almost Everything Else

John Rinn, snowboarder, skateboarder, and “genomic origamist,” on why we should dumpster-dive in our genomes and the inspiration of a middle-distance runner.

• ### Ideas

#### Going, Going, Gone

The second most common element in the universe is increasingly rare on Earth—except, for now, in America.

• ### Ideas

#### Earth-like Planets Aren’t Rare

Renowned planetary scientist James Kasting on the odds of finding another Earth-like planet and the power of science fiction.