You are on page 1of 19

# PROJECT – REGRESSION METHOD - GROUP 347

Remarks:

 Work in teams (2 persons/team). Each team should solve a different application (with a
different database). Print the text of the application (requirements and database).
 Copy the database in Excel.
 Process the data in Excel (using Data/Data Analysis/Regression) and get the Output
tables.
 Print the Output tables.
 Write down the explanations for each requirement of the application (handwritten on
paper or typed in a Word file).
 Deadline: last seminar in January.

TEAM 1

Aim of study: the relationship between the turnover value of supermarkets, number of families in
the neighborhood and commercial area of the supermarkets.
Data recorded for 30 supermarkets owned by a businessman:

Turnover value (million lei) Number of families Commercial area (hundred m2)
483 68 46
411 83 33
422 48 38
410 68 26
369 38 22
198 10 21
209 35 26
197 55 14
156 25 10
85 28 12
187 43 20
43 15 5
211 33 28
120 23 9
62 24 26
176 45 10
117 20 8
273 56 36
408 82 31
419 47 36
407 67 24
366 37 20
295 40 22
397 55 30
253 27 15
421 45 38
330 35 19
272 16 16
386 57 20
327 32 18

Process the data in Excel (Data/Data Analysis/Regression) and answer the following questions:

a. Identify the variables, the linear regression equation and interpret the partial regression
coefficients.
b. Is there enough evidence to conclude that the regression model is valid, at 95%
confidence level? (critical value: 3,35).
c. Test the significance of the model parameters (critical value: 2,05).
d. Find and interpret the confidence intervals for the model parameters.
e. What percentage in the total variability of turnover value is explained by the regression
model?
f. Measure the strength of the relationship between the three variables, using an appropriate
statistical indicator. Test its significance.
g. Analyze the direction and the strength of the relationship between the Number of families
and Turnover value. Use correlation coefficient (“CORREL” function in Excel).
h. Predict the turnover value for a supermarket with 33 hundred m2, placed in a
neighborhood where 60 resident families live.

TEAM 2

Aim of study: the behavior of a mobile phone company clients’ propensity to give up the
company services, depending on the average monthly bill value and the seniority in company
service.
Data recorded for 40 clients of the mobile phone company:
Propensity to give up the company The average monthly bill value Seniority in company
services (points) (lei) service (years)
64,12 69,06 1,26
64,85 81,09 2,2
68,48 82,23 2,58
58,77 85,59 3,45
83,45 172,78 1,38
60,77 50,85 3,23
57,48 78,82 2,75
71,99 93 1,44
56,79 46,26 3,25
53,69 62,45 4,49
55,66 49,77 3,45
54,84 60,34 3,13
63,72 47,9 2,42
78,29 162,64 1,23
62,23 59,45 2,28
61,65 70,17 2,1
73,47 170,86 1,57
66,03 84,55 2,1
63,96 59,33 3,36
56,65 70,68 3,2
60,47 58,89 3,87
71,47 72,38 1,6
71,04 69,78 3,02
67,39 61,74 2,52
70,94 100,93 1,38
62,74 65,21 2,4
51,09 84,49 4,36
64,98 48,43 2,5
68,06 98,12 3,02
66,04 71,42 2,66
61,12 69,4 2,67
66,41 56,23 2,96
59,25 55,03 2,96
70,41 65,83 1,18
77,37 184,14 1,46
62,63 37 3,3
61,02 51,95 1,92
72,11 166,92 1,34
56,81 44,72 2,51
52,65 61,93 3,42

Process the data in Excel (Data/Data Analysis/Regression) and answer the following questions:

a. Identify the variables, the linear regression equation and interpret the partial regression
coefficients.
b. Is there enough evidence to conclude that the regression model is valid, at 95%
confidence level? (critical value: 3,25).
c. Test the significance of the model parameters (critical value: 2,026).
d. Find and interpret the confidence intervals for the model parameters.
e. Compute and interpret the coefficient of determination.
f. Measure the strength of the relationship between the three variables, using an appropriate
statistical indicator. Test its significance.
g. Find out the Correlation Matrix (use Data/Data Analysis/Correlation). Explain the main
diagonal values.
h. Predict the propensity to give up the company services for a client with 4 years seniority
in the company service and a 65 lei monthly bill.

TEAM 3

Aim of study: the relationship between the number of TV sets owned by the clients of an
electrical household appliances store, household size and the average monthly income.
Data recorded for 25 clients:

## Average monthly income Number of TV

(hundred lei/pers.) Household size (persons) sets
107 3 3
109 1 3
112 1 2
118 1 3
137 2 3
19 1 1
19 2 1
19 8 1
20 2 1
22 2 1
24 1 1
30 3 1
31 2 1
37 7 1
40 1 1
42 1 1
43 2 1
46 2 1
49 3 1
51 4 1
53 6 3
57 1 1
67 1 1
83 1 2
83 3 3

Process the data in Excel (Data/Data Analysis/Regression) and answer the following questions:

a. Identify the variables, the linear regression equation and interpret the partial regression
coefficients.
b. Test the validity of the regression model, at 5% significance level (critical value: 3,44).
c. Test the significance of the model parameters (critical value: 2,07).
d. Find and interpret the confidence intervals for the model parameters.
e. Measure the relative influence of the household size and average monthly income on the
variability of the number of TV sets, using the coefficient of determination. What percent
in the total variability of the number of TV sets is not explained by the regression model?
f. Measure the strength of the relationship between the three variables, using an appropriate
statistical indicator. Test its significance.
g. Analyze the direction and the strength of the relationship between the Household size and
the Number of TV sets. Use correlation coefficient (“CORREL” function in Excel).
h. Predict the number of TV sets owned by a client with an average monthly income of 75
hundred lei and which comes from a 4 person household.

TEAM 4

Aim of study: the relationship between the sales value in the previous year, advertising
expenditure and the number of competitors, for the supermarkets owned by a businessman.
Data recorded for 24 supermarkets owned by the businessman:

Number of competitors Advertising expenditure (1000s Eur) Sales value (10000s Eur)
6 2,29 8,71
1 4,9 12,07
1 5,75 12,74
5 3,61 9,82
2 4,62 11,51
2 4,69 12,23
4 6,41 11,84
1 6,47 12,25
3 3,43 11,1
5 8,39 10,97
6 2,15 8,75
6 1,54 7,75
5 2,67 10,5
5 1,24 6,71
7 1,77 7,6
3 4,46 12,46
6 1,83 8,47
2 5,15 12,27
1 7,25 12,57
6 1,72 8,87
4 3,04 11,15
3 4,92 11,86
4 4,85 11,07
5 3,13 10,38

Process the data in Excel (Data/Data Analysis/Regression) and answer the following questions:

a. Identify the variables, the linear regression equation and interpret the partial regression
coefficients.
b. Is there enough evidence to conclude that the regression model is valid, at 99%
confidence level? (critical value: 5,78).
c. Test the significance of the model parameters (critical value: 2,83).
d. Find and interpret the confidence intervals for the model parameters.
e. What percent in the total variability of sales value is not explained by the regression
model?
f. Measure the strength of the relationship between the three variables, using an appropriate
statistical indicator. Test its significance.
g. Find and display the Correlation Matrix (use Data/Data Analysis/Correlation). Explain
the values on the main diagonal.
h. Predict the sales value for a supermarket, if the advertising expenditures are 9.5 thousand
Eur and there are 5 other similar supermarkets in the neighborhood.

TEAM 5

Aim of study: the behavior of the menswear sales value of a clothing factory, depending on the
the number of catalogs mailed to the customers and on the amounts spent on advertising the
products manufactured.
Data recorded for 30 retail stores of the clothing factory:
Amounts spent on Menswear sales value
Number of catalogs mailed

## 2451,53 3216 10278,09

2859,83 4358 11857,59
2864,37 3835 9516,91
3439,25 3565 10074,24
2714,54 4589 19504,5
2229,45 2978 11357,92
2742,65 3290 10605,95
2797,87 3029 16998,57
2894,97 2752 6563,75
2264,23 3685 6607,69
2721,06 2847 9839
2663,30 2881 9398,32
3037,50 3121 10395,53
2679,48 2811 11663,13
3280,81 3706 12805,22
2858,97 3811 13636,25
3873,87 5309 22849,01
2272,37 3081 12325,8
2491,25 3378 8273,58
2991,75 3586 10061,19
2091,15 3438 11497,76
2690,29 3589 10363,16
2607,92 3565 10194,68
3595,73 3526 8401,24
2541,60 3978 13642,89
2677,76 3761 12772,63
3074,56 3512 14539,47
2592,98 3697 14927,35
2572,16 6120 19170,12
2649,64 4115 11771,4

Process the data in Excel (Data/Data Analysis/Regression) and answer the following questions:

a. Identify the variables, the linear regression equation and interpret the partial regression
coefficients.
b. Is there enough evidence to conclude that the regression model is valid, at 5%
significance level? (critical value: 3,35).
c. Test the significance of the model parameters (critical value: 2,05).
d. Find and interpret the confidence intervals for the model parameters.
e. What percent in the total variability of sales value is not explained by the regression
model?
f. Measure the strength of the relationship between the three variables, using an appropriate
statistical indicator. Test its significance.
g. Analyze the direction and the strength of the relationship between Number of catalogs
mailed and Menswear sales value. Use correlation coefficient (“CORREL” function in
Excel).
h. Predict the sales value for a retail store, if the amount spent on advertising was 4000
thousand \$ and there were 5000 catalogs mailed to the customers.

TEAM 6

Aim of study: the relationship between the seniority at current job, household income and debt on
the credit-card for a bank customers.
Data recorded for 30 customers of the bank, randomly drawn:

Household income (lei) Seniority at current job (years) Credit-card debt (lei)
12240 23 200,6
10370 6 95,2
4420 0 17
8840 22 195,5
7310 17 100,3
4420 3 73,1
4590 8 68
2720 1 40,8
5440 0 363,8
11730 9 120,7
10880 25 161,5
9860 12 523,6
6290 2 34
27720 16 1731,2
5170 11 231,2
9350 15 146,2
20300 15 452,2
4760 2 304,3
4250 5 66,3
11390 20 651,1
6460 12 22,1
3230 3 231,2
4250 0 472,6
2720 0 30,6
3910 4 42,5
10880 24 668,1
4930 6 292,4
17000 22 629
8330 9 139,4
6970 13 496,4

Process the data in Excel (Data/Data Analysis/Regression) and answer the following questions:

a. Identify the variables, the linear regression equation and interpret the partial regression
coefficients.
b. Is there enough evidence to conclude that the regression model is valid, at 1%
significance level? (critical value: 5,48).
c. Test the significance of the model parameters (critical value: 2,77).
d. Find and interpret the confidence intervals for the model parameters.
e. What percent in the total variability of credit-card debt is explained by the regression
model?
f. Measure the strength of the relationship between the three variables, using an appropriate
statistical indicator. Test its significance.
g. Find the Correlation Matrix (use Data/Data Analysis/Correlation). Explain the main
diagonal values.
h. Predict the credit-card debt for a customer with 15 years seniority at the current job and a
household income of 7500 lei.

TEAM 7

Aim of study: the relationship between the sales value of mobile phones, the number of phone
lines opened for customers’ orders and the number of service units.
Data recorded for 30 mobile phone stores:

## Number of phone lines

Sales value of mobile phones
opened for customers’ Number of service units
(thousand \$)
orders
26 20789,67 27
28 37427,02 31
21 37578,38 33
24 34424,09 23
24 47208,79 28
34 16578,93 20
29 18236,13 20
24 43393,55 26
20 30908,49 22
17 28701,58 21
30 29647,57 23
28 31141,51 22
27 31177,31 20
35 30672,37 15
25 37633,38 20
30 33890,92 16
45 51378 29
35 18103,06 22
20 20979,5 28
25 34503,12 28
35 26783,96 22
28 31790,15 24
25 32432,74 25
24 37180,05 28
32 29658,85 19
28 33238,46 18
33 35679,33 20
34 37238,87 26
29 46766,94 30
30 21752,78 27

Process the data in Excel (Data/Data Analysis/Regression) and answer the following questions:

a. Identify the variables, the linear regression equation and interpret the partial regression
coefficients.
b. Is there enough evidence to conclude that the regression model is valid, at 10%
significance level? (critical value: 2,51).
c. Test the significance of the model parameters (critical value: 1,703).
d. Find and interpret the confidence intervals for the model parameters.
e. To what extent the total variability of sales value is explained by the regression model?
f. Measure the strength of the relationship between the three variables, using an appropriate
statistical indicator. Test its significance.
g. Analyze the direction and the strength of the relationship between Sales value and
Number of service units. Use correlation coefficient (“CORREL” function in Excel).
h. Predict the sales value if there are 30 service units and 40 phone lines opened for
customers’ orders.
TEAM 8

Aim of study: the behavior of Number of visits to health care providers (rate per 10,000
inhabitants), depending on the Health care funding (amount per 100 inhabitants) and Reported
diseases (rate per 10,000 inhabitants).
Data recorded for 20 towns, in the previous year:

## Visits to health care Health care funding

Reported diseases (rate per
providers (rate per (amount per 100
10,000 inhabitants)
10,000 inhabitants) inhabitants)
181,62 182,9 212,44
205,51 197,09 198,9
167,09 163,86 174,25
192,71 192,3 212,07
165,02 172,03 149,45
152,13 155,33 158,34
167,84 177,34 157,23
162,21 165,09 162,92
146,69 154,28 130,59
186,93 185,56 202,81
188,16 186,96 221,43
195,39 198,63 189,22
175,5 172,14 166,42
197,04 198,25 203,07
190,16 193,9 198,57
156,47 157,03 161,79
159,21 158,07 168,82
185,65 182,09 180,41
176,5 173,2 178,52
156,97 151,84 157,77

Process the data in Excel (Data/Data Analysis/Regression) and answer the following questions:

a. Identify the variables, the linear regression equation and interpret the partial regression
coefficients.
b. Test the significance (validity) of the regression model, at 5% significance level (critical
value: 3,59).
c. Test the significance of the model parameters (critical value: 2,11).
d. Find and interpret the confidence intervals for the model parameters.
e. To what extent the total variability of Visits to health care providers is explained by the
regression model?
f. Measure the strength of the relationship between the three variables, using an appropriate
statistical indicator. Test its significance.
g. Obtain the Correlation Matrix (use Data/Data Analysis/Correlation). Explain the values
on the main diagonal.
h. Predict the visits to health care providers if the health care funding amounted 200 units
per 100 inhabitants and there were reported 210 diseases cases per 10000 inhabitants.

TEAM 9

Aim of study: the behavior of Sale price of houses, depending on the Number of days beween
announcing the house sale and closing the deal, on one hand and on Living area, on the other
hand.
Data recorded for 25 houses, sold by a real estate agency:

Sale price (thousand Number of days beween announcing the house sale House living
dollars) and closing the deal area (m2)
274 85 244
265 61 151
254 1 247
229 13 158
250 16 209
335 91 298
321 54 261
300 13 306
325 96 282
210 18 210
416 62 411
342 18 394
347 133 365
284 103 250
290 104 259
294 46 366
235 74 233
250 10 185
290 13 269
247 34 289
232 15 290
278 66 240
222 73 162
265 55 218
300 85 273
Process the data in Excel (Data/Data Analysis/Regression) and answer the following questions:

a. Identify the variables, the linear regression equation and interpret the partial regression
coefficients.
b. Is there enough evidence to conclude that the regression model is valid, at 95%
confidence level? (critical value: 3,44).
c. Test the significance of the model parameters (critical value: 2,074).
d. Find and interpret the confidence intervals for the model parameters.
e. What percent in the total variability of selling price is determined by the influence of the
two explanatory variables?
f. Measure the strength of the relationship between the three variables, using an appropriate
statistical indicator. Test its significance.
g. Analyze the direction and the strength of the relationship between Sale price and House
living area. Use correlation coefficient (“CORREL” function in Excel).
h. Predict the selling price of a house with 300 m2, which was sold after 50 days since the
sale announcement.

TEAM 10

Aim of study: the behavior of extra weight, depending on the number of cigarettes smoked per
day (over past 30 days) and on the age when first smoked a cigarette.
Data recorded for 30 smokers randomly drawn:

## Age when first smoked a Number of cigarettes smoked

Extra weight (kg)
cigarette (years) per day past 30 days
0,797 11 14
1,603 15 11
1,262 17 17
1,370 15 17
0,806 15 19
1,012 16 15
0,391 16 2
1,509 15 16
0,810 13 5
1,356 18 7
0,484 15 17
0,450 15 1
0,699 13 15
0,650 17 1
1,018 14 16
6,598 16 6
1,830 11 8
0,659 16 16
0,770 16 5
0,730 12 9
2,365 11 22
1,127 14 17
0,969 16 19
1,178 17 21
0,909 14 18
1,443 17 17
0,752 21 8
1,430 22 20
0,816 15 20
0,890 11 3
1,831 19 16
2,218 19 8
2,573 16 10
1,349 17 11
1,978 14 15
1,178 13 3
2,262 13 4
2,339 41 7
0,509 21 13
3,170 15 15

Process the data in Excel (Data/Data Analysis/Regression) and answer the following questions:

a. Identify the variables, the linear regression equation and interpret the partial regression
coefficients.
b. Is there enough evidence to conclude that the regression model is valid, at 90%
confidence level? (critical value: 2,45).
c. Test the significance of the model parameters (critical value: 1,687).
d. Find and interpret the confidence intervals for the model parameters.
e. Compute and interpret the coefficient of determination.
f. Analyze the direction and the strength of the relationship between the three variables,
using an appropriate statistical indicator. Test its significance.
g. Get the Correlation Matrix (use Data/Data Analysis/Correlation). Explain the values on
the main diagonal.
h. Predict a person’s extra weight, if he started to smoke when he was 15 years old and used
to smoke 3 cigarettes per days in the last 30 days.
TEAM 11

Aim of study: the behavior of additional income earned by employees after graduating a training
course, depending on their age and household size.
Data recorded for 35 employees randomly selected:

## Additional income (units) Age (years) Household size (persons)

5,679 28 2
3,679 29 2
4,679 29 2
10,679 31 4
6,679 30 4
8,679 29 3
6,679 29 4
14,679 33 3
12,679 30 2
3,679 28 2
3,679 29 6
11,679 30 2
9,679 29 2
9,679 31 2
14,679 32 2
12,679 30 2
9,679 31 2
13,679 31 2
10,679 32 4
10,679 29 3
3,679 32 3
3,679 31 4
16,679 32 3
4,679 31 6
2,679 31 2
5,679 30 4
8,679 29 4
5,679 30 3
4,679 31 4
9,679 29 3
11,679 31 2
10,679 32 2
9,679 32 2
7,679 31 5
5,679 30 3

Process the data in Excel (Data/Data Analysis/Regression) and answer the following questions:
a. Identify the variables, the linear regression equation and interpret the partial regression
coefficients.
b. Is there enough evidence to conclude that the regression model is valid, at 95%
confidence level? (critical value: 3,3).
c. Test the significance of the model parameters (critical value: 2,037).
d. Find and interpret the confidence intervals for the model parameters.
e. Compute and interpret the coefficient of determination.
f. Analyze the direction and the strength of the relationship between the three variables,
using an appropriate statistical indicator. Test its significance.
g. Analyze the direction and the strength of the relationship between Additional income and
Household size. Use correlation coefficient (“CORREL” function in Excel).
h. Predict the additional income that might be earned by an employee aged 35, whose
household includes 4 persons.

TEAM 12

Aim of study: the behavior of monthly income depending on work experience and expertise level
(ranging from 1 to 40), for a company employees.
Data recorded for 40 employees:

## Monthly income (lei) Work experience (years) Expertise level

2650 3 12
2280 1 8
3420 18 25
4180 15 32
2920 6 17
2500 2 15
4430 12 35
3220 12 20
5410 22 38
3240 7 22
2860 5 16
2940 4 19
2740 6 14
2370 4 10
3510 21 27
4270 18 34
3010 9 19
2590 5 17
4520 15 37
3310 15 22
5500 25 40
3330 10 24
2950 8 18
3030 7 21
2774 8 15
2404 6 11
3544 23 28
4304 20 35
3044 11 20
2624 7 18
4554 17 38
3344 17 23
5534 27 41
3364 12 25
2984 10 19
3064 9 22
4550 16 40
4557 16 35
3000 10 19
3100 12 23

Process the data in Excel (Data/Data Analysis/Regression) and answer the following questions:

a. Identify the variables, the linear regression equation and interpret the partial regression
coefficients.
b. Is there enough evidence to conclude that the regression model is valid, at 95%
confidence level? (critical value: 3,25).
c. Test the significance of the model parameters (critical value: 2,026).
d. Find and interpret the confidence intervals for the model parameters.
e. What percent in the total variability of monthly income is not explained by the regression
model?
f. Measure the strength of the relationship between the three variables, using an appropriate
statistical indicator. Test its significance.
g. Analyze the direction and the strength of the relationship between Monthly income and
Work experience. Use correlation coefficient (“CORREL” function in Excel).
h. Predict the monthly income of an employee with 15 years work experience and an
expertise level of 25.

TEAM 13

Aim of study: the relationship between the insured persons’ intelligence (measured by the score
obtained in an intelligence test), age (years old) and the number of major car crashes, over the
last 10 years.
Data recorded for 50 clients of an auto insurance company:

## Intelligence test score (points) Age (years) Number of car accidents

77 23 2
88 35 1
89 26 1
87 25 0
66 28 2
50 31 1
89 23 1
65 31 3
55 31 5
78 27 1
66 26 3
68 33 3
69 27 3
45 30 5
66 34 0
78 29 2
56 24 1
70 26 2
62 25 3
60 27 3
80 27 1
60 22 4
65 29 3
87 24 0
82 25 1
66 33 3
40 40 0
56 37 6
78 31 4
90 32 3
74 31 2
87 38 1
68 34 1
80 31 0
66 28 3
80 31 1
89 30 1
88 34 1
82 34 1
86 32 0
70 35 2
81 32 1
91 35 0
67 33 2
68 39 2
81 34 1
45 30 6
79 31 1
66 33 2
60 28 3

Process the data in Excel (Data/Data Analysis/Regression) and answer the following questions:

a. Identify the variables, the linear regression equation and interpret the partial regression
coefficients.
b. Test the validity of the regression model, at 5% significance level (critical value: 3,195).
c. Test the significance of the model parameters (critical value: 2,012).
d. Find and interpret the confidence intervals for the model parameters.
e. Measure the relative influence of the insured persons’ intelligence and age on the
variability of the number of car accidents, using the coefficient of determination. What
percent in the total variability of the number of car accidents is not explained by the
regression model?
f. Measure the strength of the relationship between the three variables, using an appropriate
statistical indicator. Test its significance.
g. Get the Correlation Matrix (use Data/Data Analysis/Correlation). Explain the values on
the main diagonal.
h. Predict the number of car accidents for a client aged 25, who obtined 60 points at the
intelligence test.