Professional Documents
Culture Documents
Certificate Course
Outline
1. Data, Information and Knowledge
Batch -10
Organized by
4. Data Classification
Department of Statistics
University of Rajshahi
5. Data Processing
6. Graphical Representation of Data
Introduction to Data
Email: mrezakarim@yahoo.com
July 05, 2014
Example:
A book on Mt.
characteristics may
"information", and
Everest geological
be considered as
1. Data: symbols
10
Research
Design
Discovery and
Definition
and so on
Conclusions and
Report
Sampling
Data Processing
and Analysis
Data
Gathering
11
12
Sampling
A process of selecting units from a population
A process of selecting a sample to determine
certain characteristics of a population
3. Sampling Technique
Sample
Population
Population Sample
Prof. Dr. M. Rezaul Karim, Statistics, RU
13
Sampling
14
Sampling
15
16
17
18
Systematic Sampling:
A sample drawn from a list using a random start
followed by a fixed sampling interval.
Often used in industry, where an item is selected
for testing from a production line (say, every
fifteen minutes) to ensure that machines and
equipment are working to specification.
Alternatively, the manufacturer might decide to
select every 20th item on a production line to test
for defects and quality.
19
20
Quota sampling
A sample in which a specific number of different
types of units are selected. For example, we may
want to interview 10 teachers and decide that five
will be men and five will be women.
Judgmental sampling
In this kind of sample, selections are made based on
pre-determined criteria that, in your judgment, will
provide the data you need. For example, you may
want to interview primary school principals and
decide to interview some from rural areas as well as
some from urban areas (but no quota is established).
Snowball sampling
This type of sampling is used when we do not know
who or what should be included. Typically used in
interviews, we would ask the interviewees who else
you should talk to. We would continue until no new
suggestions are obtained.
Prof. Dr. M. Rezaul Karim, Statistics, RU
Convenience sampling
In this type, selections are made based on the
convenience to the evaluator. Principals from local
schools may be selected because they are near where
the evaluators are located.
Prof. Dr. M. Rezaul Karim, Statistics, RU
21
22
Data Classification
4. Data Classification
23
24
Qualitative Data
Quantitative Data
smells,
tastes,
25
Data Classification
26
Discrete Data
2. According to Measurement
Discrete
Continuous
27
28
Continuous Data
Data Classification
Measurable observations
Decimals or fractions
3. According to Source
Primary data
o First-hand information
o Example: Autobiography, first-time taken
financial statement, etc.
Secondary data
o Second-hand information
o Example: Weather forecast from news papers,
Data taken from published journals, books,
webpage, etc.
29
Data Classification
Data Classification
5. According to dependency of time
Time series data
4. According to Arrangement
Ungrouped data
o Raw data
o No specific arrangement
Grouped data
o Organized set of data
o At least 2 groups
o Arranged in any order
30
32
Data Classification
Nominal scale
Nominal scale is simply a system of assigning
number symbols to events in order to label them.
33
Nominal scale
34
Nominal scale
31 = 42,
1+3 = 4 or 4/2 = 2.
35
36
Ordinal scale
Interval scale
37
Ratio scale
Scale of measurement
38
39
Note:
It is essential to understand the above differences
in the nature of data and suggest appropriate
method to store and analyze them.
Many software (e.g. MS Excel and R) do not
automatically understand the nature of the data, so
we need to explicitly define the data for those
tools.
40
10
Scale of measurement
Data Classification
7. According to Failure/Survival characteristic
most
precise
least
precise
Nominal
Ordinal
Interval
Ratio
41
42
43
44
11
5. Data Processing
Process
46
45
Information
47
48
12
Data Processing
Data Processing
Step One:
Validation: Confirming the interviews/surveys occurred
Editing: The procedure that improves the quality of the data
for coding. That is, the process of checking and adjusting the
data
Consistency
Completeness
Questions answered out of order
49
Step Two:
Coding: Grouping and assigning numeric codes to the
question responses. (Codes also may be other character
symbols)
Rules for coding:
o Categories should be exhaustive
o Categories should be mutually exclusive and independent
Step Three:
Classification: Large volume of raw data are reduced into
homogeneous groups (if we are to get meaningful
relationships). Classification can be (i) according to
attributes or (ii) according to class-intervals
Prof. Dr. M. Rezaul Karim, Statistics, RU
50
Data Processing
Step Four:
Tabulation: Tabulation is the process of summarizing raw
data and displaying the same in compact form (i.e., in the
form of statistical tables) for further analysis.
Step Five:
Percentages: Percentages are often used in data
presentation as they simplify numbers, reducing all of
them to a 0 to 100 range.
51
52
13
Interval Plot
Matrix Plot
Probability Plot
Stem-and-Leaf
3D Scatter Plot
3D Surface Plot
Area Graph
Bar Chart
Box Plot
Contour Plot
Dot Plot
Empirical CDF
Histogram
53
Pie Chart
Scatter Plot
Graphs
Compare summaries or
individual values of a variable
Assess distributions
Assess relationships
between pairs of variables
Objective of analysis
54
55
56
14
References
Thank you
Any Questions?
57
58
15