You are on page 1of 8

Background: The Indian Premier league, founded in 2008, has changed the way cricket was looked at.

Initiated by Board of Cricket Council, India; IPL has significantly increased the compensation that players get as well as the entertainment quotient of the game. It is rated by Sportsintelligence.com as second highest paid after NBA. In 2009,Forbes magazine states that IPL is the fastest growing sports business in the world. The IPL is owned by range of people, from Industrialists like Mukesh Ambani , Vijay Mallya to actors like Shah Rukh Khan, Preity Zinta and Shilpa Shetty. The pricing of IPL players is a complex issue on many counts. Firstly, the number of factors driving the prices is high. Secondly, many of the factors are intangible. The aim of this project is to look at how pricing can be linked to the tangible parameters. The quantitative parameters for judging the price of player include variables like : Age of the player Skill of the player Number of runs in T- matches Number of wickets taken in T- matches Batting strike rate in One Day internationals Bowling Rate in One Day internationals Runs scored in One Day internationals wickets taken in One Day internationals Average runs scored by batsman in IPL Bowling average in IPL Highest score by batsman in IPL, Number of runs scored by the player Number of runs conceded by the player Apart from these quantitative variables, certain intangible factors also affect the price of the players. One of them is marketability of players. Some players are more marketable than others and hence command a higher price. T 20 also needs a different set of competencies as

compared to normal matches. Dexterity to bat at any order as well as bowl and captaincy are valued. Strike rate is an especially important part of T20. Excellent performance in Test matches does not necessarily imply a good performance in the T20. The players are observed very closely. The significant variables were identified to be used in the model. We arrived at these set of independent variables one the basis of step by step analysis in SPSS.We have not considered the Iconic players because their pricing has a lot of intangible factors involved which cannot be explained quantitatively. We addressed questions like the impact of ability to score SIXERS on players price, the impact of bating strike rate and bowling strike rate on the players price, identify underpaid or overpaid players, Impact of players origin on price player.

STUDY METHODOLOGY Careful case analyses led us to make certain changes to some variables. There are some qualitative variables which were significant to design the model hence we have tried to take them into account by introducing relevant dummy variables. In the variable Country, we have substituted 1 for IND and 0 for others as case hinted that Indian players are overpriced. The Playing role variable was also modified to concentrate on runs for batsmen and Wickets for bowlers. We have split the variable Age to take into effect its ordinal nature. A similar approach was also adopted for Auction year. We have used the SPSS software to design the model. We used the Stepwise method to calculate multiple regression. In SPSS Stepwise selection, if there are independent variables already in the equation, the variable with the largest probability of F is removed if the value is larger than POUT. The equation is recomputed without the variable and the process is repeated until no more independent variables can be removed. Then, the independent variable not in the equation with the smallest probability of F is entered if the value is smaller than PIN. All variables in the equation are again examined for removal. This process continues until no variables in the equation can be removed and no variables not in the equation are eligible for

entry, or until the maximum number of steps has been reached. SPSS help. The purpose of this methodology was to enable us Arrive at best model using only significant variables Minimize the impact of relation between the variables REGRESSION MODEL The following is the model we have arrived based on the following coefficients that are relevant at significance value of 5% are: Logarithm of base price, Number of Sixs, the auction year, Runs conceded by a bowler, whether the player is of Indian origin or not and of player, as has been computed nd shown below:

In(Base Price)= -214.595 + 0.995 * In(Base Price) + 0.10* Sixers+ 0.0002*Runs Ceded by Bowler + 0.107*Auction Year + 0.594*Indian Player

ANALYSIS OF RESULTS

When we used the variables: Number of Sixs, the auction year, Runs conceded by a bowler, whether the player is of Indian origin or not and logarithm of base price of player. We arrived at the value of 0.800 for R, 0.640 for R square. At every step while we were zeroing down on relevant variables, we ensured that adjusted R-Square values did not reduce. The final adjusted R Square value is 0.625. The standard error of the estimate was found to be 0.5543. Overall Significance of Model The overall significance of the model has been tested by F test through ANOVA analysis.We have a F value of 42.019 and the significance is 0.000.Hence we conclude that the number we have arrived upon are good enough to accept. ANOVAb

Model Sum of Squares Df Mean Square F Sig. 1 Regression 64.565 5 12.913 42.019 .000a Residual 36.263 118 .307 Total 100.828 123

Testing for Collinearity of Variables High Correlation amongst independent variable would suggest that explanatory ability of any such variable in the presence of others is limited. Inclusion of these variables in the model can lead to unstable regression coefficients. As shown below, coefficients display low correlations

Residual Analysis Looking at correlations only among pairs of predictors, however, is limiting. t is possible that the pairwise correlations are small, and yet a linear dependence exists among three or even more variables. Hence weve also computed the variance inflation factors (VIF) to help detect multicollinearity:

Model Collinearity Statistics Tolerance VIF ln(baseprie) .822 1.217 Country_Ind .831 1.203 Runs_c*bowler .893 1.119 AUCTION YEAR .854 1.171 SIXERS .865 1.156 A VIF about 1 means that there is no correlation among the kth predictor and the remaining predictor variables, which is true in this case, hence there is no multicollinearity requiring correction.

Residual Analysis Mahalanobis Distance isa measure of how much the value of the independent variables for this case differ from the average of all observations. A large Mahalanobis distance identifies a case as having extreme values on one or more of the independent variables.Mahalanobis distance should not exceed the critical chi-squared value with degrees of freedom equal to number of predictors and alpha =.001 Minimum Maximum Mean Std. Deviation N Predicted Value 1.0475626945496E1 1.4712657928467E1 1.2818317637895E1 . 74798986490514 130 Std. Predicted Value -3.132 2.533 .000 1.000 130 Standard Error of Predicted Value .074 .263 .117 .031 130 Adjusted Predicted Value 1.0537081718445E1 1.4746626853943E1 1.2821733447739E1 . 75084079486602 130 Residual -1.26655185222626E0 1.62698841094971E0 .00000000000002 .55101151768766 130 Std. Residual -2.254 2.895 .000 .980 130 Stud. esidual -2.298 2.967 -.003 1.003 130 Deleted Residual -1.31657600402832E0 1.70851278305054E0 -3.41580984389541E-3 . 57687758084257 130 Stud. Deleted Residual -2.339 3.065 .000 1.012 130 Mahal. Distance 1.219 27.233 4.962 3.569 130 Cook's Distance .000 .073 .008 .013 130 Centered Leverage Value .009 .211 .038 .028 130 As a thumb rule, case is leveraged if for n >= 100, if Distance > 15 In this case, the Mahanobilis distance computed for minimum and mean, approximately 1.22 and 4.96 is less than the threshold, and it exceeds only for the maximum value i.e 27.23. On the other hand, the Cooks distance measures how much the residuals ofall observationswould change if this case was excluded from the calculation of the regression

coefficients.Cooks Distance >1 strong possibility of outlier, which can be easily disproved in this case.

Plot Analysis Looking at Residual plots (residuals vs each independent variable X_i and fitted values of Y). The study of the charts indicateds that the dispersion of the residual is random, and do not indicate any pattern. (Refer Appendix 1) Thus, a non systematic patterns indicate violation of one or more assumptions

ANSWERS TO QUESTIONS

Develop a multiple regression model and identify statistically significant predictors that influence the price of players in the IPL.

The following regression model has been developed to determine the price of IPL players In(Sale Price) = -214.595 + 0.995 * In(Base Price) + 0.10* Sixers+ 0.0002*Runs Ceded by Bowler + 0.107*Auction Year + 0.594*Indian Player

Logarithm of base price, Number of Sixs, the auction year, Runs conceded by a bowler, whether the player is of Indian origin or not and of player, are the statistically significant predictors influencing the price of IPL players Cricket in the T20 format is considered a young mans sport, is there evidence that the players price is influenced by age?

By rerunning a regression model including (Fcritical= 3.82), the variable is significant.

What is the impact of the predictors batting strike rate and bowling strike rate on pricing? Identify the predictor that has the highest impact on the price of players.

Predictors batting strike rate and bowling strike rate do not have a significant impact on the model as F calculated in both cases is less than 0.5, that comes out to be much lesser than F critical (3.82). The variable having the most impact on the model is the country of origin (Fcalculated = 21.642 > Fcritical = 3.82) that is to say that if a player if from India he is much higher priced than a nonIndian.

Identify the player who is highly overpaid and the player who is highly underpaid.

Chris Gayle is highly underpaid at $800,000 . The natural logarithm (predicted value)=25.65 is significantly far from the actual value.

AS Yadav is highly overpaid at $650,000. The Ln(predicted value) = 11.93 is significantly higher than the actual value. layers of Indian origin paid more than players from other countries?

The Country_IND variable signifies if the player is Indian or not. The =0.594 is significant in this model with a t-statistic of 5.243. Thus players of Indian origin are paid higher than other country players

If a regression model was built after removing Virat Kohli from the sample, what would be the impact on the co-efficient for the predictors, INDIA and L25? How would you interpret this impact?

To calculate the impact on by removing Virat Kohli, we construct the model again after removing Virat Kohli case. The difference between the new and the old is the Dif . The table below summarizes the values:

Parameters Before removing case After removing case R2 0.640 0.654 (IND) 0.594 0.534 (L25) 0.17 0.23

Thus the removal of Virat Kohli the (IND) dropped and that of L25 increased. Thus removal of Virat Kohli shifted the regression plane.

How much should Mumbai Indians offer Sachin Tendulkar if they would like to retain him? Is the model sufficient to predict the price of Icon players?

The model is not sufficient to predict price of Icon players. These players have been excluded to arrive at the model. As such our model has treatem them as outliersas they are paid a 15% higher price than the highest paid member in the team. Assuming the highest paid player Peterson joins Mumbai Indians, Sachins salary would be 15% higher i.e $4198394

APPENDIX Charts

You might also like