You are on page 1of 61

CHAPTER-II

DATA COLLECTION AND


PRESENTATION

Definition of Data
Data are considered as the raw materials of
statistics.
Data are numerical measurement that are
collected in scientific/ systematic way and are
related to the predetermined objectives.
Statistical observations are called data.

Source of Data

Data can be obtained from three important sources-

DATA

Primary Data

Internal Data

Questioning

Observations

Personal Interview

Mail

Secondary Data

Focused Group Discussion (FGD)

Telephone

Primary Data :
Primary Data measurements observed
& record as Part of an original study,
when the data required for a Particular
study. It may become necessary to
collect the original data to conduct
first hand investigation.

Internal Data
Internal Data refer to the measurements
that are the by product of routing
business
record
keeping
like
accounting,
Finance,
Production,
Personnel, Quality control, Sales
Research and Development etc.

Secondary Data
When and investigator used the data,
which has already been collected by
others, such data are called secondary
data.
Secondary data can be obtained form
Journal, Reports, Govt. Publications,
etc.

METHOD OF OBTAINING PRIMARY DATA:

There are three basic methods of collecting


primary data
Questioning.
Observations.
Focused group Discussion (FGD).

Questioning:
Questioning, as the name suggests, is distinguished by
the fact that data are collected by asking questions from
people who are thought to have the desired information.
Questions may be asked in person, or in writing. A
formal list of such questions is called a questionnaire.
A distinction is often made between a questionnaire & a
schedule. Questionnaire refers to a device for securing
answers to questions by using a form which the
respondent fills in himself.
Schedule is the name usually applied to a set of
questions which are asked & filled in a face-to-face
situation with another person.

Observations:
When the data are collected by observation, the
investigator asks no questions. Instead he observes the
object or actions in which he is interested. Sometimes
individuals make the observations, on other occasions,
mechanical devices observe & record the desired
information.
Observation method does not automatically produce
accurate data. Physical difficulties in the observation
situation on the part of the observer may result in errors.
Even more important, however, is the influence on
observations of the observers training, philosophy,
opinions & expectations.
Examples, projects as the reading of X-ray films, ECG,
state of repair for roads.

FOCUSED GROUP DISCUSSION


When

using focused group discussions


as a research technique, the researcher
is no longer the center of activity, but
rather let informants discuss with each
other, providing guidance.
Tools Discussion Guide, Tape
Recorder.

QUESTIONNAIRE METHODS
Of the three methods named above, the
questionnaire method is most widely used for
collecting business data, when questionnaire
method is used, three different techniques of
communication with questionnaire are available Personal interview.
Mail.
Telephone.

1.Personal Interview:
Personal interviews are those in which an
interviewer obtains information from respondents
in face-to-face meetings.
The information obtained by this method is likely
to be more accurate because the interviewer can clear
up doubts, can cross-examine the informants
&
thereby obtain correct information.

2. Mail:
In most mail surveys, questionnaires are mailed to the
respondents to fill them & return by mail. Sometimes
mail questionnaires are placed in respondents hands by
other means such as attaching them to consumer
products, putting them in magazines or newspapers or
having field workers leave them with respondents. In
each case respondents complete the questionnaires
themselves & send back the completed forms by mail.
This method has a special advantage in surveys where
filed of investigation is very vast & the informants are
spread over a wide geographical area.

3. Telephone Interviews:
Are similar to personnel interviews except that
communications between interviewer &
respondent is on telephone instead of direct
personal contact.
However, this method has several limitations
such as it can not be used to interview those
people who dont have telephone, telephone
conversation can not be very long & also replies
on the phone can be erratic & unreliable.

DATA COLLECTION TECHNIQUES AND TOOLS

Data collection Techniques

Data Collection Tools

Using available information

Checklist, data, compilation form

Observing

Eyes and ears, pen and paper, watch,


tape
Or video recorder etc.

Interviewing

Interview
schedule,
checklist,
questionnaire, Tape recorder

Administering written questionnaires

Questionnaire

Organizing Focus Group Discussions

Discussion guide, tape recorder

Data:
Statistical observations are called data.
Attribute:
The phenomena, which are expressed in
some qualitative form, is termed as
attribute. Example occupation, religion,
education etc.

Variable:
A variable is a quantitative characteristic of a
person, object or phenomena that can take
more than one value. Example income, price,
height, weight, family size. There are mainly
four types of variables

Discrete Variable
Continuous Variable.
Dependent Variable.
Independent Variable.

1. Discrete Variable:
Discrete variable are those, which can
only vary by finite jumps & can not
manifest every conceivable fractional
value. It is obtained by counting. For
instance the number of persons in the
family, the number of rooms in the house,
number of employees.

2.

Continuous Variable:

A continuous variable is capable of


manifesting
every
conceivable
fractional value within the range of
possibilities. It is obtained by
measurement. Such as height, weight
of a product, age.

3.Dependent Variable:
The variable that is used to describe or measure
the problem understudy is called dependent
variable.
4.Independent Variable:
The variables that are used to describe or
measure the factors that are assumed to cause or
at least influence the problem are called
independent variable.

Presentation of Data:
Presentation can take place mainly in two forms
Statistical Table.
Statistical Charts.
A Statistical Table is the presentation of numbers
in a logical arrangements, with some brief
explanation to show what they are.
A Statistical Charts is a pictorial device for
presenting data.

Classification of Data

Classification is a grouping of related facts into


different classes.

Facts in one class differ from those of another


class with respect to some characteristics called a
basis of classification.

Sorting facts on one basis of classification & then on


another basis called cross-classification.
Classification of data is a function very similar to
that of sorting letters in post office (common
characteristics destination)
This process helps gathering important information
while dropping unnecessary details enabling
statistical treatment.

Types of Classification based on basis:

Geographical area wise, Cities, districts etc.

Chronological on basis of time.

Qualitative according to some attributes.

Quantitative in terms of some measurable


quantity (magnitude).

1. Geographical Classification:

Basis geographical or location differences.


Production of wheat, rice in different states.

Usually listed in alphabetical order for every


reference.

May also be listed by size to emphasize the


important areas as ranking the states by
population.

2. Chronological Classification:
Basis is period of time.
Time series are usually listed in
chronological order normally starting
with the earliest period.
When the major emphasis falls on the
most recent events, a reverse time
order
may be used.

3.

QUALITATIVE CASSIFICATION:

Basis of some attribute or quality as color of hair, literacy,


religion, sex etc.

The point to note, in this type of classification, is that the


attribute under study can not be measured: one can whether it is
present or absent in the units of population under study.

Example Blindness, we may find out how many persons are


blind in a given population. It is not
possible to measure
the
degree of blindness in each case. Thus often one attribute
is studied two classes are formed, one possessing the attribute &
the other not possessing the attribute. This classification is
known as simple classification.

Population
Blinds

Non-blinds

This type of classification where only two classes


are formed is also called two fold or dichotomous
classification.

If instead of forming only two classes we further


divide the data on the basis of some attributes so as
to form several classes, the classification is known
as manifold classification.

Population
Population
Population

Population

Population
Population

Population

4. Quantitative Classifications:
Basis some measurable characteristics such as height,
weight, income, sales.
Workers of a factory may be classified according to wages.
In this type of classification there are two elements
o o The Variable wage in the example below.
o o The frequency, the number of workers in each class.

CLASSIFICATION ACCORDING TO CLASS INTERVALS:

1. Class Limit
The class limit are the lowest & the highest
values that can be included in the class.
20(lower limit of the class) 30(upper limit)
The lower limit of the class is the value below
which there can be no value in that class & the
upper limit of a class is that value above which
no value can belong to that class.

2. Class Interval:
The span of the class, that is, the difference
between the upper limit & the lower limit, is
known as the class interval.
20 30 , class interval is 20.
The size of the class interval is determined by
the number of classes & total range in data.

3. Class Frequency:
The
number
of
observations
corresponding to the particular class is
known as the frequency of that class or
class frequency.

Income
800 900
900 1000

No of Emp
50
100

1000 1100

200

1100 1200

150

1200 1300
1300 1400
Total

40
10
550

There are 50 employees having income


between 800 900.

If we add together the frequencies of all


the individual classes, we obtain total
frequency. The total frequency of six classes
is 550, which means that in all
there are
550 employees whose income
has been
studied.

4. Class Midpoint:
Upper limit of the class + lower limit of the class
Mid point of a class = .
There are two methods of classifying the data
according to class Intervals
Exclusive methods.
Inclusive methods.

EXCLUSIVE METHODS:
When the class intervals are so fixed that the upper limit of one
class is the lower limit of the next class it is known as the exclusive
methods.

Income
800 900
900 1000
1000 1100
1100 1200
1200 1300
1300 1400
Total

No of Emp
50
100
200
150
40
10
550

The exclusive method ensures continuity of


data.
In the above example 50 persons whose income
between Tk. 800 & Tk. 899.99. A person who is
getting 900 would be included 900 1000 class.

INCLUSIVE METHODS:
Inclusive Methods:
Under this method of classification, the upper
limit of one class is included in that class itself.
Income
800 900
900 1000
1000 1100
1100 1200
1200 1300
1300 1400
Total

No of Emp
50
100
200
150
40
10
550

In this class, 800 899 we include persons


whose income is between 800 & 899. if the
income of a person is exactly 900 he is
included in the next class.
It should be noted that both the inclusive &
exclusive method give us the same class
intervals. For inclusive method the class
interval is obtained by taking the difference
between the two upper limits.
Class Interval = 999 899 = 100

STATISTICAL TABLE

One of the simplest & most revealing devices for


summarizing data in a meaningful fashion is the
statistical table. The purpose of the table is to simplify
the presentation & to facilitate comparison.

Parts of a Table:
1. Table Number:

The number may be given either in the center of the top


above the title.
Left hand side (L.H.S) of the table at the top or at the
bottom on the L.H.S.

2. Title of the table:


The title is the description of the content of the
table.
A complete title has to answer the questions
o What precisely are the data in the table?
o Where the data occurred (the precise geographical,
political or physical area covered).
o When the occurred?
The title should be clear, brief & self-explanatory.

3. Caption:
Caption refers to the column heading. It explains
what the column represents. It may consists of
one or more column headings.
Under a column heading there may be subheads.
The caption should be clearly defined & placed
at the middle of the column.
If the different columns are expressed in
different units, the units should be should be
specified along with the captions.

4. Stub:
Stubs are designation of the rows or row headings.
They are placed at the extreme left.
5. Body:
The body of the table contains the numerical
information.
This is the most vital part of the table.
Data presented in the body arranged according to
description & classifications of the captions & stubs.

6. Head note:
It is a brief explanatory statement applying to all or a
major part material in the table, & is placed below the
title entered & enclosed in brackets.
It is used to explain certain points relating to the
whole table that have not been included in the title
nor in the captions or stubs.
For example, unit of measurement is frequently
written as the head note, such as in thousand or in
million tones or in crores.

7. Foot note:
Anything in a table, which the reader may find difficult
to understand from the title, stubs or captions should
be explained in foot notes.
If footnotes are needed they are placed directly below
the body of the table.
Footnotes are used for four main purposes
o To point out exceptions to avoid any conclusion based on
that exception.
o Any special circumstances affecting the data strike.
o To clarify anything in the table.
o To give source in case of secondary data.

Graphs:
Broadly the various graphs can be divided under
the following two heads
Graphs of Time series or line graphs.
Graphs of Frequency distributions.

Graphs of Frequency Distributions:

Histograms or column diagrams.


Frequency Polygon
Smoothed frequency curve
Cumulative frequency curve.

FREQUENCY DISTRIBUTION:
Frequency distribution is a statistical table
which shows the set of all distinct values of
the variables arranged in order of magnitude
either individually or in groups with their
corresponding frequencies side by side. A
frequency distribution table is given below:

FREQUENCY DISTRIBUTION TABLE


Class Interval

Tally

Frequency

20-30

III

30-40

IIII

40-50

6
IIII I

50-60

IIII III

60-70

II

70-80

PROBLAM
The profit (in take) of 30 companies for year 2010-2011 are
given below20,
65,
49,
29,
1.
2.

22,
35,
42,
37,
42,
48,
53,
49,
39,
48,
67,
18,
16,
23,
37,
35,
63,
65,
55,
45,
58,
57,
69,
25,
58,
65.
Classify the above data taking a suitable class interval.
Represent the data frequency distribution table given there,
Highest value = 69
Lowest value = 16
Range
= 53

53 5 = 11
53 25 = 2.00
Profits (0in taka

Tally

No of frequency

15 25

IIII

25 35

II

35 45

IIII II

45 55

IIII I

55 65

IIII

65 - 75

IIII

5
N = 30

Name of graphs

X-axis

Y-axis

His to gram

Class lower limit

Frequency

F. Polygon

Class mid values

Frequency

Frequency Distribution :
Frequency distribution is a statistical table
which shows the set of all distinct values of the
variables arranged in order of magnitude either
individually or in groups with their
corresponding frequencies side by side. A
frequency distribution table is given below:

Frequency Histogram :
A histogram is a graphical method by presenting data, where observations are
located on a horizontal axis (usually grouped into intervals) and the frequency
of those observation is depicted along the vertical axis.

FREQUENCY DISTRIBUTION TABLE


Profits (0in taka

Tally

Frequency

20 30

III

30 40

IIII

40 50

IIII I

50 60

IIII III

60 70

II

70 80

1
N = 30

Histogram

5
4
Frequency 3
2
1
0

20-30
30-40
40-50
50-60
1
0 10 20 30 40 50 60 70
Class Interval

60-70
70-80

Frequency polygon :
A frequency polygon is a graphical display of a frequency table.
The intervals are shown on the X- axis and the number of scores
in each interval is represented by the height of a point located
above the middle of the interval. The points are connected so
that together with the X- axis they form a polygon.

FREQUENCY DISTRIBUTION TABLE:


Profits (0in taka

Tally

Frequency

20 30

III

30 40

IIII

40 50

IIII I

50 60

IIII III

60 70

II

70 80

Frequency Polygon
9
8

Frequency

7
6
5

Frequency

4
3
2
1
0
20-30

30-40

40-50

50-60

60-70

Class Mid-point

70-80

Bar Diagram :
A bar chart or bar graph is a chart with rectangular bars
with lengths proportional to the values that they represent.
They bars can be plotted vertically or horizontally. Bar
charts are used for plotting discrete data that is data, which
has discrete values and is not continuous.

Year
1995
1996
1997
1998
1999

Production (Ton)
50
60
71
72
74

Bar Diagram

50

60

71

72

74

Production (Ton)
Year

1995

1996

1997
Years

1998

1999

Pie Chart :
A pie chart is circular chart divided into sectors. In a pie
chart, the length of each sector is proportional to the
quantity it represents. When angles are measured with 1
turn as unit then a number of percent is identified with the
same number of cent turns. Together, the sectors create a
full disk. It is named for its resemblance to a pie, which
has been sliced. The pie chart is the most ubiquitous
statistical chart in the business world and the mass media.
To present the above data in a pie chart, we have to
calculate the (%) of the personnel by job status or by
degree of angles occupied, lets the calculation:-

Sl. No.

Person by job status

Workers

436

53

192.7

Computer
operator

218

27

96.3

Officers

109

13

48

Others

54

23

X of personnel

(%)of personnel

f
360
N

f = number of frequency
N = total number of frequency

Degree of angles

Pie- Chart

13%

7%
1
2
53%

27%

3
4

Pie- Chart

3
13%

4
7%

1
2

2
27%

1
53%

3
4

THANK YOU

You might also like