Project Bollywood - Final Report

Cross Tabulation
As one of our objective to find out the preference of the respondents, hence we
decided to carry out cross-tabulation on the basis of the satisfaction level of the
respondents and their frequency to watch a movie in the theatre in a month. The
area in which the preferences were looked after were
1) Which kind of Director’s philosophy they liked in watching a movie?
2) What kind of actors do they prefer to watch for particular script or director’s
philosophy?
3) Which kind of music do they prefer in a movie?

From the cross-tabulation it is clear that those who are satisfied or neutral about the
current bollywood movies, they will prefer to watch a movie which is close to reality.
They feel that actors who have established themselves in this field are fit to portray
characters. They felt that Romantic or soft numbers will really add value to such
movies. One interesting fact from this survey is that even those who are currently
not satisfied or those who watch movie more than once in month share the same
philosophy.
Preference of the respondents who are

Satisfied dissatisfied Freq
and & Very more
Type
Neutral Dissatisfied than
once
Actor Established 59 15 54
Music Soft 28
Romantic 29 11 30
Direct
or Realistic 34 8 29
Sarcastic
comedy 7
About 92 % of the respondents watches movie for entertainment purpose and at the
same time they want that it should be more close to reality. It is also well clear from
the type of script they prefer to watch. From the graph below, we can see that
maximum number of respondents wants the script of the film should be close to
reality or current issues prevalent in the society.
It is not that other scripts or philosophy wil not work, for example Dhoom was a
Trendy and stylish kind of movie with all fast music numbers and comparatively
new face, but movie like 3 – Idiots, Taare Zameen Par etc were more preferred with
a repeat audience. The main USP of 3 – idiot was all the factors as preferred by the
respondents.
The above survey does not indicate that our TG group wants a real life story in a
serious manner. They want that this philosophy should be explained in a light way
or in comic way. This is clear from the graph below. Maximum number of
respondents prefered comedy movies over other genres like romance, action,
thriller etc.
On this front also, 3 – Idiots scores well. The movie brought about the problem of
graduating youth in a light humorous way which was well accepted by our TG.
From cross-tabulation, it is clear that maximum respondent prefered comedy.
However, there were many respondents who liked action, comedy, romance and so
on. Hence combining these genres, we found out that maximum respondents who
has multiple genre choices, majority of them prefered either Romantic Comedy or
action comedy. Few of them prefered Action Romance Comedy.
Multiple Regression to find relationship between Satisfaction level of the Respondent and
the main attributes of the film
Variables Entered/Removed(b)
Variables Variables
Model Entered Removed Method
1
Location,
Hero, Film
Title, Genre,
Director,
. Enter
Heroine,
Songs,
Prod.House
, Story(a)
a All requested variables entered.

b Dependent Variable: Satisfaction
Model Summary
Adjusted R Std. Error of

Model R R Square Square the Estimate
1 .340(a) .116 .041 .794
Coefficients
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.

1 (Constant) 3.589 .433 8.296 .000
Genre -.056 .069 -.084 -.804 .423
Film Title -.074 .075 -.104 -.990 .324
Story .212 .084 .346 2.520 .013
Hero -.072 .084 -.098 -.854 .395
Heroine -.066 .075 -.095 -.886 .378
Director -.210 .086 -.270 -2.451 .016
Prod.
.138 .083 .182 1.667 .098
House
Songs -.068 .074 -.102 -.909 .365
Location .048 .070 .070 .681 .497
The main aim of carrying out multiple regression was to find a correlation between the
satisfaction level of the respondent with the various attributes of the movie such as genre, title,
actor, director etc.
But as seen from the table, after running multiple regression in SPSS, it is clear that the change
in the satisfaction level is explained by only 11.6% change in the above mentioned independent
variable. Hence, there may be other variables affecting the satisfaction level of the respondents.
Checking out the multicollinearity, it was found that there was much any relation between the
independent variables hence clarifying that each variable is independent with respect to each
other.
Analysis of Q – 10 - Promotional activities influencing the respondents to watch a movie
In order to find out, whether the variables such as TV promo, Music, PR activities, Critics
Review, Word of Mouth etc have an equal impact on the respondents in influencing them to
watch a particular movie, we first applied one way ANNOVA test.
One way ANNOVA Test

Null Hypothesis :- All the variables mentioned above create an equal impact in influencing
people to watch a movie.
Alternative Hypothesis :- All the promotional effort have different impact on influencing
ANOVA
Source of P-
Variation SS df MS F value F crit
Between 24.4065 2.2266
Groups 99.25 5 19.85 78 1E-22 49
0.81330
Within Groups 580.7 714 53
679.9
Total 5 719
As F(observed) > F Critical, our null hypothesis is rejected, and it is clear that all promotional effort
has different impact on influencing people to watch a movie.
To find out which of these promotional variables should be focused more, which can highly
influence the people. For that we carried out Factor Analysis.
Factor Analysis
Methodology
1. Different kind of variables such as TV promo, Music, PR activities, Controversies, Word

of mouth etc as attributes on basis of which factoring is to be done.
2. Using Principle component method for data extraction and saving factor scores as
variables to be formed basis for cluster analysis
The Output and Analysis
Total Variance Explained
Comp Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings
onent Total % of Variance Cumulative % Total % of Variance Cumulative % Total % of Variance Cumulative %
1 1.766 29.440 29.440 1.766 29.440 29.440 1.611 26.857 26.857
2 1.411 23.511 52.951 1.411 23.511 52.951 1.566 26.094 52.951
3 .917 15.291 68.242
4 .811 13.519 81.760
5 .566 9.442 91.202
6 .528 8.798 100.000
Extraction Method: Principal Component

analysis
Rotated Component Matrixa
Component
1 2
TV_Promo .756 -.013
Music .825 -.070
PR_Activities .587 .303
Critic_reviews -.034 .752
wordofmouth .034 .726
controversies .110 .613
Extraction Method: Principal Component

Analysis.
Rotation Method: Varimax with Kaiser
Normalization.
From the table of Total Variance Explained, we found out only two factors which explain about
53% cumulative variance of the variables given. This is mainly due to the fact that all the
variables are not having much difference. Hence we can say that all the promotional variables
can be grouped in to two factors namely Direct Promotion and Indirect Promotion
Direct Influence :- As seen from the Rotated Component Matrix, TV Promo, Music and PR
Activities are the variables which are related to the above factor.
Indirect Influence :- As seen from the Rotated Component Matrix, Critic review, Word of
mouth, Controversies are the variables which are related to the above factor.
Cluster Analysis
We carried out hierarchical clustering using agglomeration scheduling. We used Dendogram plot
to find out the clusters. Warde method of linkage was used and squared Euclidean distance used
as a basis to find out the clusters.
Agglomeration Schedule
Cluster Combined Stage Cluster First Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage
1 55 113 .000 0 0 67
2 77 110 .000 0 0 7
3 97 108 .000 0 0 111
4 100 101 .000 0 0 29
5 93 94 .000 0 0 75
6 20 92 .000 0 0 15
7 18 77 .000 0 2 32
8 34 68 .000 0 0 79
9 57 66 .000 0 0 19
10 44 56 .000 0 0 46
11 3 47 .000 0 0 78
12 31 33 .000 0 0 40
13 28 29 .000 0 0 90
14 12 27 .000 0 0 45
15 19 20 .000 0 6 43
16 1 4 .000 0 0 65
17 52 114 .000 0 0 39
18 39 116 .001 0 0 50
19 57 62 .002 9 0 86
20 6 8 .004 0 0 56
21 43 72 .007 0 0 73
22 79 87 .010 0 0 44
23 40 83 .013 0 0 27
24 46 112 .015 0 0 76
25 64 89 .018 0 0 58
26 9 80 .021 0 0 48
27 7 40 .025 0 23 52
28 96 120 .029 0 0 41
29 23 100 .032 0 4 47
30 98 103 .036 0 0 68
31 10 76 .040 0 0 39
Dendrogram using Ward Method
C A S E 0 5 10 15 20 25
Label Num +---------+---------+---------+---------+---------+
55 ─┐
113 ─┤
39 ─┤
116 ─┤
11 ─┤
34 ─┤
68 ─┼─┐
16 ─┤ │
118 ─┤ │
13 ─┤ │
25 ─┤ │
3 ─┤ │
47 ─┤ ├─────┐
9 ─┤ │ │
80 ─┤ │ │
85 ─┘ │ │
100 ─┐ │ │
101 ─┤ │ │
23 ─┤ │ ├─────────────────────┐
50 ─┼─┘ │ │
95 ─┘ │ │
97 ─┬─┐ │ │
108 ─┘ │ │ │
86 ─┐ │ │ │
109 ─┤ │ │ │
1 ─┤ ├─────┘ │
4 ─┤ │ │
78 ─┤ │ │
107 ─┤ │ │
40 ─┼─┘ │
83 ─┤ │
7 ─┤ │
14 ─┤ │
117 ─┤ │
57 ─┤ │
66 ─┤ │
62 ─┤ │
44 ─┤ │
56 ─┤ │
59 ─┤ │
58 ─┘ │
98 ─┐ │
103 ─┤ ├─────────────────┐
61 ─┤ │ │
48 ─┤ │ │
6 ─┼─┐ │ │
8 ─┤ │ │ │
67 ─┤ │ │ │
102 ─┤ │ │ │
91 ─┤ │ │ │
46 ─┤ │ │ │
112 ─┤ │ │ │
2 ─┤ │ │ │
36 ─┘ │ │ │
96 ─┐ │ │ │
120 ─┤ ├─────────┐ │ │
63 ─┤ │ │ │ │
71 ─┤ │ │ │ │
77 ─┤ │ │ │ │
110 ─┤ │ │ │ │
18 ─┤ │ │ │ │
88 ─┤ │ │ │ │
74 ─┤ │ │ │ │
79 ─┤ │ │ │ │
87 ─┤ │ │ │ │
37 ─┼─┘ │ │ │
12 ─┤ │ │ │
27 ─┤ ├─────────────────┘ │
5 ─┤ │ │
64 ─┤ │ │
89 ─┤ │ │
75 ─┤ │ │
82 ─┤ │ │
99 ─┘ │ │
41 ─┬─────┐ │ │
54 ─┘ │ │ │
31 ─┐ │ │ │
33 ─┤ │ │ │
21 ─┤ │ │ │
115 ─┤ │ │ │
24 ─┤ ├─────┘ │
119 ─┤ │ │
70 ─┤ │ │
43 ─┤ │ │
72 ─┤ │ │
26 ─┤ │ │
52 ─┼─────┘ │
114 ─┤ │
10 ─┤ │
76 ─┤ │
53 ─┤ │
65 ─┤ │
32 ─┤ │
111 ─┤ │
38 ─┤ │
106 ─┤ │
49 ─┤ │
22 ─┘ │
93 ─┐ │
94 ─┤ │
104 ─┤ │
30 ─┤ │
81 ─┼─────────────┐ │
84 ─┤ │ │
60 ─┤ │ │
45 ─┤ │ │
42 ─┘ │ │
35 ─┐ │ │
51 ─┤ ├─────────────────────────────────┘
73 ─┤ │
28 ─┼─┐ │
29 ─┤ │ │
20 ─┤ │ │
92 ─┤ │ │
19 ─┤ ├───────────┘
105 ─┤ │
17 ─┘ │
15 ─┐ │
69 ─┼─┘
90 ─┘
Based on the coefficients and looking at the Dendogram, we found out that there are two clusters
with different characteristics.
TV Music PR Critic word controver No of

Clust Prom Activit revie of sies respond
er o ies ws mout ents
h
1 4.39 4.47 3.92 4.13 4.68 4.11 38
2 4.30 4.66 3.66 3.13 3.71 3.11 56
The above table shows the mean of the values for the given variables for each cluster. It is seen
from table that clusters – 2 which has maximum respondents music has the high mean value as
compared to cluster -1. Also there is not much significant difference in the mean values for TV
promo and PR activities.
From the cluster Analysis, we can interpret that Direct Influencing variables like TV
Promo, Music, PR Activities should be given more importance while designing promotion
campaign in order to influence people to watch movies.
Analysis of Q – 5 - Factors influencing the respondents to watch a movie
In order to find out, whether the variables such as :
Genre
Film Title
Story
Lead Hero
Lead
Heroine
Director
Production
House
Songs
Sets/Locatio
n
have an equal impact on the respondents in influencing them to watch a particular movie, we
first applied one way ANNOVA test.
One way ANNOVA Test
Null Hypothesis :- All the variables mentioned above create an equal impact in influencing
Alternative Hypothesis :- All the promotional effort have different impact on influencing
ANOVA
Source of
Variation SS df MS F P-value F crit
5.0650 3.36792E- 1.9470
Between Groups 55.56296296 8 6.94537037 22 06 33
Within Groups 1468.6 1071 1.37124183
Total 1524.162963 1079
As F(observed) > F Critical, our null hypothesis is rejected, and it is clear that all the different factor
has different impact on influencing people to watch a movie.
To find out which of these promotional variables should be focused more, which can highly
influence the people. For that we carried out Factor Analysis.
Factor Analysis
Methodology
1. Different kind of variables such as Genre, Story, Lead Hero etc as attributes on basis of
which factoring is to be done.
2. Using Principle component method for data extraction and saving factor scores as
variables to be formed basis for cluster analysis
The Output and Analysis

From the table of Total Variance Explained, we found out only three factors which explain about
60% cumulative variance of the variables given. This is mainly due to the fact that all the
variables are not having much difference. Hence we can say that the entire influencing variable
to watch a movie can be grouped in to three factors namely Cast, Crew& Content, Film Identity
and X-fator.
Cast, Crew& Content: - As seen from the Rotated Component Matrix, Story, Lead Hero,
Lead Heroine and Director are the variables which are related to the above factor.
Film Identity: - As seen from the Rotated Component Matrix, Film Title and Production
House are the variables which are related to the above factor.
X-factor:- As seen from the Rotated Component Matrix, Songs and Sets/Location are the
variables which are related to the above factor.
Cluster Analysis
We carried out hierarchical clustering using agglomeration scheduling. We used Dendogram plot
to find out the clusters. Warde method of linkage was used and squared Euclidean distance used
as a basis to find out the clusters.
Agglomeration Schedule
Cluster Combined Stage Cluster First Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage
1 100 101 .000 0 0 38
2 48 49 .000 0 0 57
3 43 46 .000 0 0 62
4 28 29 .000 0 0 107
5 67 84 .004 0 0 72
6 15 117 .016 0 0 21
7 61 64 .029 0 0 102
8 80 94 .046 0 0 88
9 74 114 .066 0 0 32
10 12 16 .090 0 0 82
11 7 23 .116 0 0 45
12 13 95 .145 0 0 16
13 2 31 .175 0 0 22
14 17 33 .212 0 0 56
15 86 112 .255 0 0 32
16 4 13 .303 0 12 43
17 5 11 .350 0 0 69
18 14 90 .399 0 0 67
19 51 91 .449 0 0 27
20 27 89 .501 0 0 60
21 3 15 .554 0 6 30
22 2 75 .608 13 0 56
23 9 120 .662 0 0 42
24 98 105 .722 0 0 89
25 38 42 .783 0 0 70
26 93 99 .845 0 0 65
27 19 51 .921 0 19 60
28 81 115 .996 0 0 64
29 10 56 1.072 0 0 46
30 3 72 1.156 21 0 85
31 34 96 1.241 0 0 39
Dendrogram
* * * * * * * * * * * * * * * * * * * H I E R A R C H I C A L C L U S T E R A N A L
Y S I S * * * * * * * * * * * * * * * * * * *
Dendrogram using Ward Method
Rescaled Distance Cluster Combine
C A S E 0 5 10 15 20 25
Label Num +---------+---------+---------+---------+---------+
100 ─┐
101 ─┤
83 ─┤
12 ─┤
16 ─┼─┐
24 ─┤ │
70 ─┤ │
107 ─┘ │
77 ─┐ │
118 ─┤ │
102 ─┤ ├─────────┐
81 ─┤ │ │
115 ─┤ │ │
27 ─┤ │ │
89 ─┤ │ │
51 ─┤ │ │
91 ─┼─┘ │
19 ─┤ │
15 ─┤ │
117 ─┤ │
3 ─┤ │
72 ─┤ │
74 ─┤ ├─────────────────────────┐
114 ─┤ │ │
86 ─┤ │ │
112 ─┤ │ │
26 ─┘ │ │
52 ─┐ │ │
116 ─┤ │ │
108 ─┤ │ │
82 ─┤ │ │
9 ─┼─┐ │ │
120 ─┤ │ │ │
111 ─┤ │ │ │
53 ─┘ ├─────────┘ │
93 ─┐ │ ├───────┐
99 ─┤ │ │ │
25 ─┼─┘ │ │
79 ─┤ │ │
97 ─┤ │ │
110 ─┘ │ │
73 ─┐ │ │
104 ─┼─┐ │ │
85 ─┘ │ │ │
88 ─┐ ├───────┐ │ │
109 ─┤ │ │ │ │
6 ─┼─┘ │ │ │
5 ─┤ │ │ │
11 ─┤ │ │ │
78 ─┘ ├───────────────────────────┘ │
98 ─┐ │ │
105 ─┼─────┐ │ │
30 ─┤ │ │ │
76 ─┤ │ │ │
8 ─┘ │ │ │
34 ─┐ ├───┘ │
96 ─┤ │ │
66 ─┼─┐ │ │
22 ─┤ │ │ │
92 ─┘ │ │ ├─┐
17 ─┐ ├───┘ │ │
33 ─┤ │ │ │
2 ─┤ │ │ │
31 ─┤ │ │ │
75 ─┼─┘ │ │
80 ─┤ │ │
94 ─┤ │ │
13 ─┤ │ │
95 ─┤ │ │
4 ─┤ │ │
119 ─┤ │ │
67 ─┤ │ │
84 ─┤ │ │
65 ─┤ │ │
103 ─┘ │ │
28 ─┬───┐ │ │
29 ─┘ │ │ │
14 ─┐ ├─────────────┐ │ │
90 ─┤ │ │ │ │
71 ─┤ │ │ │ │
113 ─┼───┘ │ │ │
7 ─┤ │ │ │
23 ─┤ │ │ │
18 ─┤ │ │ │
87 ─┤ ├───────────────────────────┘ │
1 ─┘ │ │
48 ─┐ │ │
49 ─┤ │ │
39 ─┤ │ │
50 ─┼─────┐ │ │
35 ─┘ │ │ │
61 ─┬─┐ ├───────────┘ │
64 ─┘ │ │ │
55 ─┐ ├───┘ │
62 ─┤ │ │
106 ─┼─┘ │
60 ─┘ │
10 ─┐ │
56 ─┼─┐ │
69 ─┘ │ │
38 ─┐ ├───┐ │
42 ─┤ │ │ │
47 ─┼─┘ │ │
68 ─┤ │ │
37 ─┤ │ │
40 ─┘ │ │
43 ─┐ ├─────────────────────────────────────────┘
46 ─┤ │
41 ─┼─┐ │
63 ─┘ │ │
45 ─┐ │ │
54 ─┤ │ │
36 ─┤ │ │
58 ─┼─┼───┘
44 ─┤ │
57 ─┘ │
59 ───┘
Based on the coefficients and looking at the Dendogram, we found out that there are two clusters
with different characteristics.
Genr Film Story Lead Lead Direc Product songs Sets / No of

e Title hero Heroi tor ion Locati resp
Cluste
ne House on
r
1 3.25 2.86 4.20 3.63 3.58 3.82 3.14 3.20 2.41 71.00
2 3.00 2.95 3.41 3.64 3.36 2.91 3.09 4.32 4.18 22.00
3 2.10 3.55 1.45 2.30 2.30 2.45 2.95 1.95 2.65 20.00
The above table shows the mean of the values for the given variables for each cluster. It is seen
from table that clusters – 1 which has maximum respondents Genre, Story, Lead Hero, Lead
Heroine, Director and Production House has the high mean value as compared to cluster -2
and3.
From the cluster Analysis, we can interpret that Cast, Crew and Content factor should be
given more importance while making a movie in order to influence people to watch movies.
Que-11-Satisfaction Level with the Movie
Here we used one sample z-test to find out about the satisfaction level of the respondent
with the movie released in the recent past.
So we construct a null and alternative hypothesis.
Null Hypothesis---mean is greater than or equal to

3
Alternative hypothesis---mean is less than 3
Critical value for alpha=0.05 will be 1.6577 one
tail t test.
So we calculated t(calculated value)-1.806
So here Tcal> Ttab
So we will reject the Null Hpothesis and will accept the Alternative Hypothesis.
It means the most of the people in sample believe that they are not satisfied with the kind of
movie in the recent past.

Project Bollywood - Final Report

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project Bollywood - Final Report

Uploaded by

Copyright:

Available Formats

Cross Tabulation

1) Which kind of Director’s philosophy they liked in watching a movie?

3) Which kind of music do they prefer in a movie?

Preference of the respondents who are

a All requested variables entered.

Adjusted R Std. Error of

Model B Std. Error Beta t Sig.

Analysis of Q – 10 - Promotional activities influencing the respondents to watch a movie

One way ANNOVA Test

1. Different kind of variables such as TV promo, Music, PR activities, Controversies, Word

Total Variance Explained

1 1.766 29.440 29.440 1.766 29.440 29.440 1.611 26.857 26.857

2 1.411 23.511 52.951 1.411 23.511 52.951 1.566 26.094 52.951

3 .917 15.291 68.242

4 .811 13.519 81.760

5 .566 9.442 91.202

6 .528 8.798 100.000

Extraction Method: Principal Component

Rotated Component Matrixa

TV_Promo .756 -.013

Music .825 -.070

PR_Activities .587 .303

Critic_reviews -.034 .752

wordofmouth .034 .726

controversies .110 .613

Extraction Method: Principal Component

Cluster Combined Stage Cluster First Appears

Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage

3 97 108 .000 0 0 111

4 100 101 .000 0 0 29

TV Music PR Critic word controver No of

In order to find out, whether the variables such as :

One way ANNOVA Test

Total 1524.162963 1079

The Output and Analysis

Cluster Combined Stage Cluster First Appears

Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage

1 100 101 .000 0 0 38

Dendrogram using Ward Method

Rescaled Distance Cluster Combine

Genr Film Story Lead Lead Direc Product songs Sets / No of

So we construct a null and alternative hypothesis.

Null Hypothesis---mean is greater than or equal to

So we calculated t(calculated value)-1.806

So here Tcal> Ttab

You might also like