Professional Documents
Culture Documents
EXECUTIVE SUMMARY
The research is done to find the solutions for some of questions like; the relationship between the doctors and their charges, the type of insurance having its effects on its customers and if there is any relationship between charges, payor and physicians. The data that is used in this research analysis to find the proper equation model for the relationship between payor, physicians and charges and number of days stay at the hospital, is from the Bryant/Smith Case 42 Hospital Charges (Appendix 1). As described in the case, the hospitals revenues are determined largely by the patients insurance coverage. The data being used are for normal delivery of babies. The null hypothesis test states that there is no relation between the patients with managed care insurance and patients with commercial insurance. This means there is no difference between the patients with either insurance. The alternate hypothesis states that patients with managed care insurance are paying more than patients with commercial insurance. The 95% confidence level was chosen to calculate hypothesis test, and regression analysis. Based on the hypothesis test, that was conducted to compare the difference between the two different insurances, it is concluded that patients with managed care insurance are paying more charges than the patients with commercial insurance. Hypothesis test was also conducted to check whether there is a difference between the charges related to physicians. It was concluded after doing the test that, physician#2 has the highest charges among all the physicians. Linear Regression Analysis is conducted to see if there is any relationship between the DAYS and CHRGS, PHYS, and/or PAYOR. After running several models with different independent variables, it is concluded that there is a linear relationship between DAYS and CHRGS.
INTRODUCTION
The purpose of this report is to prepare a data model based on some of the questions like; whether the patient is charged more based on the insurance, or whether the physicians have different charges for their patients, and/or if the charges (CHRGS) are related to days (DAYS), physicians (PHYS) and/or payor (PAYOR). The data that is used to prepare the report contains four variables; hospital stay in DAYS, charges that are charged to the patients in CHRGS, type of the physician treated the patient in PHYNS and the type of insurance the patient is carrying in PAYOR. The data was given in the Bryant/Smith case 42: Hospital Charges.
Bryant/Smith Case 42: Hospital Charges As mentioned in the text, descriptive statistics is a science of describing the important aspects of a set of measurements (Bowerman). The measures of central tendency measures mean, median and mode. The measures of variation measures range, standard deviation and variance. The mean is sum of numbers divided by the total numbers. As explained in encyclopedia2 of the free dictionary website, the arithmetic mean is found by adding the numbers and dividing the sum by the number of numbers in the list (Farflex, Inc). This is what is most often meant by an average. The median is the middle value in a list ordered from smallest to largest (Farflex, Inc). The mode is the most frequently occurring value on the list (Farflex, Inc). The hospital doctors are not the employees of the hospital, but they have control over certain functions such as prescribing medicines to the patients and also prescribing them the stay at the hospital for further advanced treatments (Bryant-Smith, 2003). Ill start my research analysis with the comparison of two insurances; managed care insurance and commercial insurance. The question being asked in the case study is, Do charges incurred by a patient depend on the type of insurance the patient has? (Bryant-Smith, 2003) In respect to answer the above question I have started my research with running some descriptive statistics. Here, in this research analysis, I have separated the DAYS and CHRGS by the insurance. That means the patients who carry managed care insurance and those who carry commercial insurance. Patients who carry commercial insurance are only 96 and patients who carry managed care insurance are 193. The total number of patient data that are studied here is 289. Comparison between Managed Care Insurance and Commercial Insurance Lets just look into the managed Care Insurance. The mean of $2714.28, is the average charge of one day stay at the hospital when the patient has managed care insurance. This means that, those patients who have managed care insurance, pay average of $2714.28 per day. Similarly, the mean of 2.02 days means that the average stay of the patient at the hospital carrying managed care insurance is 2.02 days. For Managed care insurance, the median for charges is $2789.00 and for days it is 2 days. These values tell us that these are the middle values in a list when sorted by the insurance type. For managed care insurance the mode is $2840 for the charges and 2 days for the days. Descriptive statistics
DAYS 193 2.30 1.18 1.09 1 14 13 1.17 1.08 CHRGS 193 2,966.44 1,429,843.96 1,195.76 929 14898 13969 1,422,435.44 1,192.66
Count Mean sample variance sample standard deviation Minimum Maximum Range population variance population standard deviation
Bryant/Smith Case 42: Hospital Charges As measures of variation measures range, variance and standard deviation, we will look at the data that were calculated for Managed Care insurance. The range is largest measurement minus the smallest measurement. By definition, population variance is the average of the squared deviations of the individual population measurements from the population mean . The population standard deviation is the positive square root of the population variance. For the managed care insurance, range is $13969 for charges and 13 for days. The population variance is $1,422,435.44 for charges and 1.17 for days. The population standard deviation is $1192.66 for charges and 1.08 for days. Specifically, range does not give us any good idea about the data. As range is just the difference between maximum value and minimum value, it does not provide us good representation of the entire data set. We are 95% confident that the average charges of the patients carrying managed care insurance is in between $2796.67 and $3136.21. Similarly, the average stay at the hospital by these same patients is in between 2.15 days and 2.45 days at the 95% confidence level. Now lets look into Commercial Insurance. The mean of $2966.44, is the average charge of the one day stay at the hospital when the patient has commercial insurance. This means that, those patients who have commercial insurance, pay average of $2966.44 per day. Similarly, the mean of 2.30 days means that the average stay of the patient at the hospital carrying commercial insurance is 2.30 days. For Commercial insurance, the median for charges is $2673.50 and for days it is 2 days. These values tell us that these are the middle values in a list when sorted by the insurance type. For Commercial insurance, the mode for charges is not available as there is none repetitive values but for the days, mode is 2 days. As explained above for the range, variance and standard deviation, the range for commercial insurance is $4031, which is the difference between the maximum charge of $4933 and minimum charge of $902. Similarly, the range for days of stay at the hospital is 2 days is the difference between maximum days if say 3 days and minimum stay of day is 1 day. The population variance for charges is $444,669.76 and for days it is 0.33 days. The standard deviation for charges is $666.84 and for days it is 0.58 days. As the population of data is large, we are 95% confident that the charge for the patients carrying commercial insurance is in between $2578.46 and $2850.10. Similarly for days, it is in between 1.90 days and 2.14 days at 95% confidence level. After comparing the two insurances, I have come to the conclusion that patients who carry managed care insurance pay more than those patients carrying commercial insurance. But just be looking at the data, in the descriptive statistics does not help us reach the conclusion that managed care patients are paying more charges for the services than commercial insurance patients. So I have decided to run the hypothesis test to see if my conclusion agrees with me. Hypothesis Test Hypothesis test is a statistical procedure to provide the evidence against or in favor of the hypothesized statement or the claim. Here we are to compare two different samples or variables, 4
Bryant/Smith Case 42: Hospital Charges managed care insurance and commercial insurance. I claim that managed care insurance patients are paying more charges and so to see if my claim is true, I will run hypothesis test. Let 1 = Charges to the patient of Managed Care insurance and 2 = charges to the patient of Commercial insurance H0: 1 = 2 versus Ha: 1 > 2 The null hypothesis states that there is no difference between the charges that are charged to patients carrying managed care insurance and patients carrying commercial insurance. The alternate hypothesis states, that patients who carry managed care insurance pays more charges than the patients who carry commercial insurance. I have used here 2-sample test with unknown variances. This test is used to compare two individual variables with different variances. We are here comparing two different variables; managed care insurance and commercial insurance and both of them have different variances. So this test is going to be useful in determining whether managed care insurance is more expensive then commercial insurance. I chose the significance level of alpha to be 0.05 because I am running the tests with 95% of confidence level. So that leaves me with 0.05 significance level of alpha. The test statistic t is 2.30. The critical value t.05 is 1.645. Since t = 2.30 > t.05 = 1.645, we reject H0 at the 0.05 level of significance. Since test statistic t = 2.30 is greater than critical value t.05 = 1,645, we reject null hypothesis, that states that there is no difference between the charges charged to patients with managed care insurance or to patients with commercial insurance. In conclusion, we accept alternate hypothesis that stats that after the difference between managed care insurance and commercial insurance, it is proved that managed care insurance has much higher charges than commercial insurance. At the significance level of alpha = 0.05, we have a strong evidence that the null hypothesis states that there is no difference between managed care insurance and commercial insurance is false. Another question in the case study is to find out if there is a difference among the charges of the physicians. There seem to be two types of physicians; one of them has very high charges compared to other physicians and one physician has the lowest charges than others. The charge for physician#2 seems to be highest among all the physicians. To check, whether my assumption is right, I am conducting a 2-sample unknown variance hypothesis test. Let 1 = physician#2 with highest charges and 2 = all other physicians 5
Bryant/Smith Case 42: Hospital Charges H0: 1 = 2 versus Ha: 1 > 2 The null hypothesis states that there is no difference between the charges of the physicians. The alternate hypothesis states that physician#2 has the highest charges amongst all other physicians. The 2-smaple test with unknown variances is conducted to find out if any one of the hypothesis is true. The significance level of alpha chosen is t.05. Since t = 14.31 > t.05 = 1.645, we reject H0 at significance level of alpha = t.05. In the conclusion, we have strong evidence at significant level of alpha = 0.05, that there is difference between the physicians and that one of the physicians or physician#2 has the highest charge amongst all. Linear Regression Analysis Linear Regression Analysis is conducted to check whether charges depend on Days, Physicians or the Payor. I chose CHRGS (charges) as the dependent variable and DAYS, PHYS and PAYOR as individual dependents. I ran three types of model, each with different variable. After running all three different models, I concluded that the best model is CHRGS vs. DAYS. The reason behind choosing this model is that its correlation coefficient is R = 0.801 and coefficient of determination is R2 = 0.641. The correlation coefficient R and coefficient of determination R2 is far away from 1 in all other models. The regression equation is: Y = 930.7042 + 884.2014 * DAYS The slope is b1 = 884.2014. This means that for each increase in DAYS of stay, there will be expected increase in CHRGS (charges) by about $884.20. In other words, with every increase of a day of stay, there will be additional charges of $884.20. The coefficient of determination is R2 = 0.641 This means that 64.1% of variation in CHRGS (charges) can be explained by variation in DAYS. The large percentage of the variation in CHRGS is explained by the independent variable DAYS, this model is a good fit for data. The p-value for F test is p = 7.50E-66. This means that we have Extremely Strong evidence of a linear relationship between DAYS and CHRGS (charges). The hypotheses for F test are: H0:1 = 0 vs. Ha:10 This test tells us that if we reject H0, then we have evidence of a linear relationship between the two variables; DAYS (x variable) and CHRGS (y variable). Thus a small p-value for the F-test is evidence of a significant linear relationship between the variables. 6
Bryant/Smith Case 42: Hospital Charges The correlation of coefficient is R = 0.801, which indicates a strong positive correlation between DAYS and CHRGS. Correlation Coefficient R is greater than 0, this means that correlation is positive. This also means that an increase in DAYS (day of stay) will increase CHRGS (charge per day). It is also concluded that the correlation is very strong because R = 0.801 ia very much close to 1.
CONCLUSION
Based on the hypothesis test, that was conducted to compare the difference between the two different insurances, it is concluded that patients with managed care insurance are paying more charges than the patients with commercial insurance. Hypothesis test was also conducted to check whether there is a difference between the charges related to physicians. It was concluded after doing the test that, physician#2 has the highest charges among all the physicians. Linear Regression Analysis is conducted to see if there is any relationship between the DAYS and CHRGS, PHYS, and/or PAYOR. After running several models with different independent variables, it is concluded that there is a linear relationship between DAYS and CHRGS. In the last I would like to include that the assumptions that I made by looking at the data were reached in true manner and to mu satisfaction.
Bryant/Smith Case 42: Hospital Charges Appendix 1 Data from Bryant/Smith Case 42 Hospital Charges.
Variable DAYS CHRGS PHYS PAYOR Meaning is the number of days the patient stays in the hospital is the total expense charged to the patient is the code identifying the physicians indicated the type of insurance the patient carried.
10
11
Bryant/Smith Case 42: Hospital Charges Appendix 2 Descriptive Statistics For Days And Charges Of The Patients Carrying Managed Care Insurance Descriptive statistics
DAYS 193 2.30 1.18 1.09 1 14 13 1.17 1.08 0.08 2.15 2.45 0.15 CHRGS 193 2,966.44 1,429,843.96 1,195.76 929 14898 13969 1,422,435.44 1,192.66 86.07 2,796.67 3,136.21 169.77
Count Mean sample variance sample standard deviation Minimum Maximum Range population variance population standard deviation standard error of the mean confidence interval 95.% lower confidence interval 95.% upper half-width empirical rule mean - 1s mean + 1s percent in interval (68.26%) mean - 2s mean + 2s percent in interval (95.44%) mean - 3s mean + 3s percent in interval (99.73%) Skewness Kurtosis coefficient of variation (CV) 1st quartile Median 3rd quartile interquartile range Mode
1.21 3.39 88.1% 0.13 4.47 99.0% -0.96 5.56 99.0% 6.77 70.34 47.22% 2.00 2.00 3.00 1.00 2.00
1,770.68 4,162.20 90.2% 574.92 5,357.96 97.4% -620.84 6,553.72 99.0% 5.62 51.70 40.31% 2,367.00 2,789.00 3,287.00 920.00 2,840.00
12
Descriptive statistics
DAYS 96 2.02 0.34 0.58 1 3 2 0.33 0.58 0.06 1.90 2.14 0.12 CHRGS 96 2,714.28 449,350.50 670.34 902 4933 4031 444,669.76 666.84 68.42 2,578.46 2,850.10 135.82
Count Mean sample variance sample standard deviation Minimum Maximum Range population variance population standard deviation standard error of the mean confidence interval 95.% lower confidence interval 95.% upper half-width empirical rule mean - 1s mean + 1s percent in interval (68.26%) mean - 2s mean + 2s percent in interval (95.44%) mean - 3s mean + 3s percent in interval (99.73%) Skewness Kurtosis coefficient of variation (CV) 1st quartile Median 3rd quartile interquartile range Mode
1.44 2.60 66.7% 0.86 3.18 100.0% 0.28 3.76 100.0% 0.00 0.07 28.70% 2.00 2.00 2.00 0.00 2.00
2,043.95 3,384.62 79.2% 1,373.61 4,054.95 93.8% 703.27 4,725.29 99.0% 0.60 1.62 24.70% 2,219.75 2,673.50 3,118.25 898.50 #N/A
13
Bryant/Smith Case 42: Hospital Charges Appendix 4 MegaStat Output Comparing Managed Care Insurance and Commercial Insurance Hypothesis Test: Independent Groups (t-test, unequal variance)
CHRGS-MC 2966.44 1192.66 193 CHRGS-CI 2714.28 mean std. 666.84 dev. 96 n 283 252.16000 109.55447 0 2.30 .0110 df difference (CHRGS-MC - CHRGS-CI) standard error of difference hypothesized difference t p-value (one-tailed, upper) confidence interval 95.% lower confidence interval 95.% upper margin of error
36.51497 467.80503 215.64503 F-test for equality of variance 1422437.9 444675.59 3.20 1.97E-09
14
Descriptive statistics
#1 252 2,793.58 595,734.48 771.84 902 6710 5808 2,304.00 2,675.50 3,115.75 811.75 2,741.00
Count Mean sample variance sample standard deviation Minimum Maximum Range 1st quartile Median 3rd quartile interquartile range Mode
Appendix 6 2-Sample t - Test Unknown Variables for Comparing Physician#2 with All Other Physicians.
Label Mean Sd N High CHRG PHYS 2 All Other PHYS 3489.49 2793.58 2.70 771.84 37 252
PHYS 6
PHYS 7 PHYS 10 2856.76 2559.30 2786.26 2.36 1.98 2.06 25 40 54 PHYS 13 PHYS 14 3124.55 2742.00 2.20 2.00 20 28
15
Bryant/Smith Case 42: Hospital Charges Appendix 7 2-Sample T Test For Comparing High Charge Of Physician#2 With All Other Physicians. Hypothesis Test: Independent Groups (t-test, unequal variance)
High CHRG PHYS 2 3489.49 2.7 37 All Other PHYS 2793.58 771.84 252 251 695.91000 48.62338 0 14.31 1.12E-34 mean std. dev. n df difference (High CHRG PHYS 2 - All Other PHYS) standard error of difference hypothesized difference t p-value (one-tailed, upper) confidence interval 95.% lower confidence interval 95.% upper margin of error
600.14820 791.67180 95.76180 F-test for equality of variance 595736.9856 7.29 81719.75 1.49E-81
16
Bryant/Smith Case 42: Hospital Charges Appendix 8 Regression Analysis with Independent Variable DAYS and Dependent Variable CHRGS Regression Analysis
r r Std. Error ANOVA table Source Regression Residual Total 0.641 0.801 633.703 n k Dep. Var. 289 1 CHRGS
df 1 287 288
MS 206,041,299.1584 401,580.0345
F 513.08
p-value 7.50E-66
Regression output variables Intercept DAYS coefficients 930.7042 884.2014 std. error 93.8922 39.0355 t (df=287) 9.912 22.651 p-value 4.13E20 7.50E66
confidence interval 95% lower 95% upper 745.8997 807.3691 1,115.5088 961.0336
17
Bryant/Smith Case 42: Hospital Charges Appendix 10 Regression Analysis for independent variable PHYS and dependent variable CHRGS Regression Analysis
r r Std. Error ANOVA table Source Regression Residual Total 0.009 -0.092 1053.532 n k Dep. Var. 289 1 CHRGS
df 1 287 288
MS 2,744,807.3552 1,109,930.1802
F 2.47
p-value .1169
Regression output variables Intercept PHYS coefficients 3,105.4358 -25.5667 std. error 154.6158 16.2580 t (df=287) 20.085 -1.573 p-value 1.21E56 .1169
confidence interval 95% lower 2,801.1111 -57.5667 95% upper 3,409.7604 6.4333 std. coeff. 0.000 -0.092
Appendix 11 Regression Analysis for independent variable PAYOR and dependent variable CHRGS Regression Analysis
r r Std. Error ANOVA table Source Regression Residual Total 0.013 0.113 1051.328 n k Dep. Var. 289 1 CHRGS
df 1 287 288
MS 4,076,432.1016 1,105,290.3727
F 3.69
p-value .0558
Regression output variables Intercept PAYOR coefficients 2,714.2813 252.1592 std. error 107.3007 131.3025 t (df=287) 25.296 1.920 p-value 4.90E75 .0558
confidence interval 95% lower 2,503.0851 -6.2787 95% upper 2,925.4774 510.5971 std. coeff. 0.000 0.113
18
Bryant/Smith Case 42: Hospital Charges References Bowerman. Essentials of Business Statistics. McGraw-Hill Irwin. Bryant-Smith. (2003). Bryant-Smith: Practical Data Analysis Volume I. The McGraw-Hill companies. Farflex, Inc. (n.d.). Mean, median and Mode. Retrieved 02 18, 2010, from The Free Dictionary: http://encyclopedia2.thefreedictionary.com/mean,+median,+and+mode
19