You are on page 1of 8

Implicitly Dened Baseball Statistics

December 9, 2012
Joe Scott

Introduction

Major League Baseball uses statistics to determine awards every season. The batting champion is given to the player with the highest batting average. The Cy Young Award is given to the top pitcher which is determined by many dierent statistics including earned run average (ERA). Batting average and ERA have been used for many years and are major statistics in baseball. Neither batting average or ERA consider the skill of the opposing pitcher or batter. Thus, every pitcher and batter is considered to have the same skill level. We develop an implicitly dended statistic that determines the skill or value of a player. The value of a batter and the value of a pitcher is based on the skill of the oppposing pitcher and batter respectively. We use linear algebra to nd eigenvector solutions to the eigenvalue problem, A = x, which generates each players statistical value.

Idea

Consider a baseball league in which there are Nb players who bat, represented by bi for 1 i Nb . We represent the number of pitchers in the league as pj , 1 j Np where Np is the number of pitchers. Nb is dened as the number players who record an at bat during a specic season and Np is the number of players who record a pitching appearance during a season. The total number of players in the league, Ntp , is represented by the inequality Ntb Nb + Np . This inequality considers players who both hit and pitch. Since in the National League pitchers hit as well as pitch we need to add the pitchers to the total number of batters and in interleague play (which is when American League teams face National League teams in the regular season) American League pitchers bat when the National League team is home. For each batter, a batting average, bai , is produced by hi 1 bai = = abi abi 1
Np

hi,j
j =1

where hi is the number of hits recorded for batter i and abi is the number of at bats recorded by batter i. A similar statistic for pitchers is called opponents batting average, oba. Opponents batting average is determined by number of hits against a pitcher j divided by the number of at bats against a pitcher j . So, obaj = ohj , oabj

where ohj is number of hits against pitcher j and oabj is number of at bats against pitcher j . Opponents batting average is not sucient because it places emphasis on the success of the hitter so we dene pitcher eectiveness of pitcher j as pej = oabj ohj . oabj

Thus, we consider how successful the pitcher is at recording outs. However, pitching is not only dened by at bats and hits. A pitcher could walk or hit a batter. Thus, we consider plate appearance, paj and the number of times a player gets on base, obj . We redene pitcher eectiveness as Nb paj obj 1 pej = = (pai,j obi,j ). paj paj i=1 That is, pej takes plate appearance minus the number of men that reach base and divide the dierence by number of plate appearances. There exist many situations where batting average and pitcher eectiveness is not a good indication of the players skill. In a division where the average pitcher eectivness is low, a batter could obtain more hits, thus raising his batting average. This creates a problem when the batter is compared to a dierent batter who plays in a division where the pitcher eectiveness is high. The same argument could be made for pitcher eectiveness. We seek a metric that levels the playing eld.

The Statistic

We assign weights to each batter and pitcher, wba and wpe respectively. For batters, we dene the weighted batting average, 1 wbai = abi
Np

wpej hi,j
i=1

where wpej is the weighted pitching eectiveness of pitcher j . Similarly, 1 wpej = paj
Nb

wbai (pai,j obi,j ).


i=1

That is, we dene Nb weights, wbai for 1 i Nb , and Np weights, wpej for 1 j Np as 1 wba1 = ab1 1 wba2 = ab2 . . . wbaNb 1 = abNb
Np Np

wpej h1,j
j =1 Np

wpej h2,j
j =1

wpej hNb ,j
j =1 Nb

(1)

1 wpe1 = pa1 wpe2 = . . . wpeNp 1 pa2

wba1 (pai,1 obi,1 )


i=1 Nb

wbai (pai,2 obi,2 )


i=1

1 = paNp

Nb

wbai (pai,Np obi,Np ).


i=1

We dene the weighted batting average vector wba, and the weighted pitching eectiveness, wpe as wpe1 wba1 wpe2 wba2 wba = . . , wpe = . . . . . wpeNp wbaNb Combining these vectors, we achieve the weight vector w= Let (AB)i,j = abi,j , wba . wpe

(H)i,j = hi,j , (PA)i,j = pai,j , 3

(OB)i,j = obi,j be Nb Np matrices for at bats, hits, plate appearances, and on-base respectively. So, system (1) can be written as wba 0 MH = T wpe N (PA OB) 0 and M = and N = If we dene C= 0 MH , T N (PA OB) 0 (3)
1 ab1

wba , wpe

(2)

0
1 ab2

0 . . . 0

0 ...

0 0 . . .
1 abNb

1 pa1

0
1 pa2

0 . . . 0

0 .. .

0 0 . . .
1 paNp

then system (2) can be expressed as the following system w = Cw. However, this system may not have a solution except when w= 0. So, we consider the problem w = Cw. In general, there could be up to Nb + Np number of options for . To we nd a unique , we enforce the following properties. We want to be positive and real whose corresponding eigenvector is either non-negative or non-positive. In the past ten years of baseball statistics, only one that ts the criteria each year. The following theorem gives conditions under which C meets all requirements.

Theorem 1. Perron-Frobenius Theorem Let A be an irreducible non-negative n n matrix. Then A has a real eigenvalue 1 with the following properties: a)1 > 0 b)1 has a corresponding positive eigenvector. 4

All of our matrices are non-negative due to the fact that in baseball, at bats, hits, plate appearances and ability to be on base are only considered in the natural numbers. Thus, C is an (Np + Nb ) (Np + Nb ) matrix that is non-negative. We must now consider the irreducibility. An irreducible n n matrix A exist if and only if there does not exist a permutation matrix P, such that P1 AP = A1 A2 0 A3

where A1 = 0 and A3 = 0. In the past ten years, baseball has satised this condition but conditions under which this is guaranteed are unknown.

Example

Consider Joes Baseball League, hits/ at bats Jim Brian 1/4 Tommy 2/5 Cody 5/6 Derek 3/7 James 1/4 Bob 2/7
Greg Rich Mike Evan

1/3 2/8 2/5 3/6 0/0 0/6

2/8 5/8 0/2 0/9 1/8 0/3


Greg

0/1 1/5 2/4 5/7 3/10 1/5


Rich

1/2 0/0 0/1 4/9 1/3 3/4


Mike Evan

on base/plate appearances Jim Brian 2/5 Tommy 2/5 Cody 6/8 Derek 5/9 James 1/8 Bob 3/9 AB = 4 5 6 7 4 7 3 8 5 6 0 6 8 1 2 8 5 0 2 4 1 9 7 9 8 10 3 3 5 4 ,

1/7 2/8 5/9 3/8 2/2 0/8

2/8 5/8 0/2 0/9 1/8 0/3 1 2 5 3 1 2 1 2 2 3 0 0

1/3 1/5 1/5 0/4 2/5 0/6 5/7 4/12 6/13 2/4 2/6 5/7 2 5 0 0 1 0 0 1 2 5 3 1 1 0 0 4 1 3

H=

PA =

5 5 8 9 8 9

7 8 9 8 2 8

8 3 5 8 5 4 2 5 4 , 9 7 12 8 13 4 3 6 7

OB =

2 2 6 5 1 3

1 2 5 3 2 0

2 5 0 0 1 0

1 1 2 5 6 2

1 0 0 4 2 5

Using these matrices and (3), we solve the eigenvalue/eigenvector problem w = Cw. This yields the following solutions. 0.4592 Batter Brian Tommy Cody Derek James Bob BA (Rank) 0.278(4) 0.385 (3) 0.500(1) 0.395 (2) 0.240 (5) 0.240 (5) WBA (Rank) 0.224 (4) 0.304 (2) 0.327(1) 0.281 (3) 0.167 (6) 0.173 (5) Pitcher PE (Rank) Jim 0.432 (2) Greg 0.310 (4) Rich 0.211 (5) Mike 0.436 (1) Evan 0.333 (3) WPE (Rank) 0.276(4) 0.377 (3) 0.409(1) 0.283 (5) 0.384 (2)

2012 results

Using play-by-play les, we were able to calculate all weights for the 692 batters and 715 pitchers. The results for 2012 are as follows: Batter Miguel Cabrera Adam Jones Albert Pujols Alex Gordon Starlin Castro Mike Trout WBA (Rank) 0.59785(1) 0.56685 (2) 0.56255 (3) 0.55757 (2) 0.55172 (5) 0.48024 (N/A) BA (Rank) 0.330 (2) 0.287(44) 0.285 (50)) 0.294 (30) 0.283(53) 0.326(4) Pitcher OBA (Rank) Felix Hernandez 0.241 (26) James Shields 0.239 (23) Hiroki Kuroda 0.249 (39) Clayton Kershaw 0.210(2) Mat Latos 0.230 (9) Justin Verlander 0.237(4) WPE (Rank) 1.5708(1) 1.5472(2) 1.5062(3) 1.3641 (4) 1.2680(5) 1.2301(6)

According to this statistic, Miguel Cabrera would be the MLB batting champion in this set of data according to our metric. Interestingly enough, Cabrera won the American League triple crown which is given to a player who leads their league in batting average, runs batted in, and home runs. Also, Mike Trout is included because he was a front runner for the MVP but nished runner-up to Miguel Cabrera. Trout was not in the top ve in wba but we should note that this metric determines the top hitter and pitcher, not the best all around 6

player. This is because we do not consider the game as a whole, which is the reason Trout was considered for the MVP. It is intersting that in wpe, the top 5 pitchers are all starting pitchers because they face the most batter. This gives conrmation that the metric, wpe, could be used to determine the best pitcher since normally the Cy Young award is given to a starter.

Future Studies

We need to prove that yearly baseball data always produces an irreducible matrix C by showing that the adjacency matrix corresponding to batter-pitcher interactions is strongly connected. Next, consider making a small adjustment to create a weighted on base percentage by changing C to be C= 0 J OB T N (PA OB) 0 where, J =
1 pa1

0
1 pa2

0 . . . 0

0 .. .

0 0 . . .
1 paNb

We also may consider slugging percentage and relate wpe to the slugging percentage for a batter versus a pitcher. This would require looking at the amount of extra base hits a pitcher gives up and more specically, the amount of doubles, triples, and homeruns. The data would determine how dicult it is to achieve a certain extra base hit o a pitcher. The metric might also be used to predict the wba or wpe for a player who changes teams. An example is Mike Napoli who batted 0.238 in 2010 with the Texas Rangers. However, he was in the top 30 players in wba. He was then traded to the Texas Rangers where his batting average was 0.330 in the following year. Was this a result of facing worse pitching with Texas schedule or was this just by chance? Finally, are there other sports where this implicitly dene statistic can be used?

References
[1] Retrosheet. (n.d.). Retrieved from http://www.retrosheet.org/ [2] Statistics. (2012, June 6). Retrieved from http://mlb.mlb.com/stats/. [3] JE and DM. 2011. An Implicitly Dened Baseball Statistic. (unpublished notes).

You might also like