Professional Documents
Culture Documents
Type of Data
Quantitative Data: Data for which arithmetic operation makes sense. Ex: Age, Salary, Length.
Categorical Data: Data obtained by putting individuals in dierent categories. Ex: Gender, States of a country
Visualization
Interpreting a Histogram
Shape: symmetric, skewed, unimodal, bimodal Center: mean, median Spread: range, standard deviation, inter-quartile range
Mean: If we have a data set x1 , . . . , xn then mean of the data set is x1 ++xn . n
Notation: x
Mean: Example
Median: Middle number in a sorted data set. When the number of observations (sample size) is an even number then there are two middle numbers. In that case, we take average of the two middle numbers to obtain the median.
Notations: x
Median: Example 1
Median: Example 2
Mode: Observation in the data set with the largest frequency. Note that we can have more than one mode for a data set.
Mode: Example
Eect of an Outlier
Eect of an Outlier
Mean, Median, and Mode has the same unit as the data.
Symmetric:
Left skewed:
Right skewed:
Range: max-min
Variance:
n x 2 i=1 (xi )
Standard deviation:
n x 2 i=1 (xi )
Standard deviation has the same unit as the data but the unit of variance is square of the unit of the data.
Standard Deviation
Quartiles
Notation: Q1
Quartiles
Notation: Q3
Exercise
Quartiles
*IQR is a robust measure of spread. IQR does not get aected much by skewness or outliers.
Exercise
Boxplot
*We will create a box plot for the rubiks cube data set.
Shape:
Outliers: Any observation not in the range [Q1 1.5 IQR, Q3 + 1.5 IQR] is considered an outlier (Informal Rule).
Example
*Bar Chart
*Pie Chart