You are on page 1of 5

Todays class will introduce basic definitions/terminology and present widely used graphical and numeric descriptors of data.

The Population is everything you wish to study A Variable is used to represent a characteristic of each member of the population A Census is a study of the entire population.

A Simple Random Sample is a sample that has been selected in such a way that all members of the population have an equal chance of being included in the sample. Generate random samples using the software
Data from problem 1-23 Data arranged in a single column 100 observations of warp breakage data for yarn specimens Excel: Data Analysis-Sampling Minitab: Calc-Random Data-Sample from Columns

Sampling Error is the difference between the behavior of the entire population and a sample of that population. The amount of Variation refers to how different the members of the population are from each other with regard to the variable being studied.

A Parameter is a number which describes a characteristic of the population. A Statistic is a number that describes a characteristic of a sample.

The Size of the Population is the number of members of the population. It is referred to as N. The Size of the Sample is referred to as n. A Biased Sample is a sample which does not represent the population.

Qualitative Data describe a particular characteristic of a sample item. They are often non-numerical in nature, e.g., gender, occupation, etc. Data that are created by assigning numerical classifications that have label meaning only are called Nominal Data, e.g., part numbers

Discrete Data are data that can take on only certain values. These values are often integers or whole numbers. Continuous Data are data that can take on any one of an infinite number of possible values over an interval on the number line.

Tools of Descriptive Statistics allow you to summarize data An Inference is a deduction of a generalization from sample data The Techniques of Inferential Statistics allow us to draw inferences or conclusions about the population from a sample

We will use Probability theory to calculate the likelihood of observing or selecting a particular sample from a population. Probability is often useful in predicting the behavior of a random variable.

A Frequency Table or Frequency Distribution is a table containing each category, value or class of values that a variable might have and the number of times that each one occurs in the data

The Relative Frequency of a classification is the number of times an observation falls into that classification represented as a proportion of the total number of observations. It can be expressed as a fraction, decimal, or percentage

A histogram is a picture of a frequency distribution in which the y-axis represents the frequencies and the xaxis represents ranges on the observed measurements The x-axis is divided into sections consistent with the class or bin width and constitute the base for a rectangle

The height of the rectangle along the yaxis is consistent with the frequency of the observations in the class or bin Each bin can be labeled with its center point or its bounds

Rules of thumb for histogram construction


Number of Classes (Bins) and Class

Could use histograms to get a frequency table and then generate a bar chart Illustration problem 1-23 N=100 observations

Interval (Bin Width) for Continuous Data: #classes=n Manual bin calculation based on problem objectives

Excel/Minitab will apply a default bin ranges unless you specify one 100 = 10 classes, (829-15)/10 = 81.4 cell width (see default Excel histogram)

Difference between histogram and bar chart Using a Pivot Table with qualitative data to generate a frequency table Faculty.xls file demonstration Sketch the desired format and then drop categories in the pivot table.

In a Dot Plot, each observation is plotted as a point on a single, horizontal axis. The axis is scaled so that each of the data points can be located uniquely on the axis. When there is more than one observation with the same value the points are stacked on top of each other.

Generate a dot plot of years of service using the Faculty.xls file with Minitab

The Variability of a set of data describes how the data are spread out around the center with respect to the smoothness and magnitude of the variation. When data are not evenly spread out on either side of the center then we refer to the distribution as being skewed.

A Statistic is a numerical descriptor that is calculated from sample data and is used to describe the sample. Statistics are usually represented by Roman letters, (e.g., Xbar and s) A Parameter is a numerical descriptor that is used to describe a population. Parameters are usually represented by Greek letters, (e.g., and ).

Numerical Descriptors of a Data Set: The Mean is the center of gravity of a set of data, and is found by adding up all of the data values and dividing by the number of observations The Population Mean is represented by the Greek letter . The Sample Average is usually denoted as Xbar. The Median is the value of the middle observation in an ordered set of data Illustration of the Mean and Median for Symmetric vs. Skewed Distributions:

The Sample Mode is the data value that has the highest frequency of occurrence in the sample

A Sample Range, R, is the difference between the maximum and minimum observations in the sample The Sample Variance, s2, is the average of the squared deviations of the data values from the sample mean The Sample Standard Deviation, s, is the positive square root of the sample variance

The Modal Class is the class interval in a frequency distribution or histogram that has the highest frequency

With a population of size N,thepopulation mean and variance are computed using slightly different formulas.

The Empirical Rule says that for a bellshaped, symmetric distribution:


- about 68% of all data values are within one standard deviation of the mean - about 95% of all observations are within two standard deviations of the mean - almost all (more than 99%) of the observations are within three standard deviations of the mean.

68%

95% >99%

You might also like