You are on page 1of 24

Spelman College

Predictive Analytical Software (PASW) Users Manual

Sociology 334 Statistics

Monday, Wednesday, Friday 11:00 am

Dr. Bruce H. Wade

April 26, 2011

Christian J. Watkins

Table of Contents

Chapter 1: INTRODUCTION

pg. 3

Chapter 2: FIRST STEPS

pg. 4-8

Chapter 3: UNIVARIATE ANALYSIS & FREQUENCY DISTRIBUTIONS

pg. 9-12

Chapter 4: GRAPHING AND CHARTING DATA

pg. 13-16

Chapter 5: BIVARIATE ANALYSIS OF DATA

pg. 17-18

Chapter 6: MULTIVARIATE ANALYSIS

pg. 19-21

Chapter 7: CONCLUSION

pg. 22

References Page

pg. 23

Chapter 1 Introduction:
PASW stands for Predictive Analytics Software. This program is an all-inclusive software used to analyze data. The program, SPSS, Statistical package for the Social Sciences, is used to compute statistical analysis. PASW can receive data from just about any type of file and use them to produce charts, designs of tendencies and distributions, descriptive statistics, multifaceted statistical analysis, and tabulated charts. PASW is a software that makes statistical analysis more suitable and accessible for experienced users and beginners alike. The straightforward menus and dialog box selections make it possible to present intricate analyses without typing in any command syntax. This software is great for behavioral research because it allows the researcher to quickly and efficiently interpret social data, such as observations, surveys and interview question responses. This program enables a greater emphasis on conceptual understanding and interpretation. It also allows the user to study statistics in a way that reflects statistical practice. For my particular Senior thesis, on Fashion and its influence in the Spelman community, this PASW software with help me to identify whether incoming freshmen and transfer students are likely change their personal appearance to fit-in with the Spelman look and the statistical significance that certain events may have on this conformity. I will use PASW to analyze the results of my student survey that I will give to randomly selected incoming new students. I will create a codebook of my student questionnaire and create a dataset in PASW to test my results.

Chapter 2 The First Steps:


How to start a quantitative research project: In a quantitative research project, the goal should be to determine the relationship between an independent variable and a dependent variable in a population. Quantitative research is about quantifying relationships between variables. These relationships are expressed through statistics like differences between means, correlations, and relative frequencies. The first step in conducting a quantitative research project is to decide what type of research you want to conduct. You must then turn your idea into a quantifiable research question. You must state the problem of your quantitative research, or the reason why you are conducting this research project. After doing so, you then state your hypothesis, or educated guess as to what the outcome will be of this study. For example, I am doing a research project on the relationship between driving drunk and regretful behavior. The problem is that a person is killed every 30 minutes by a drunk driver. My hypothesis is that there is a positive relationship between those who drink and drive and it resulting in regretful behavior. In step two, you must conduct a literature review on your research topic. The goal of the literature review is to find similar research that has already been conducted on your research topic that will help form your research report. It is best to have a number of different sources such as journal articles, research papers, books, websites, and systematic reviews. When writing your literature review, remember to put the various literature in conversation with one another, not a paragraph description of each. Be sure to note the key methods and findings involved. The third part of conducting a research project is to design your study and develop your methodology. In this step, you decide what type of study you want to conduct. Will there be participant involvement? Is this a secondary analysis study? You must decide if you want to use surveys, questionnaires, sampling methods, etc. Once you have made a decision as to exactly how you want to test and obtain data, you create the design. For

example, I am trying to see whether natural hair or permed hair is most accepted or liked by Spelman students. I then create a questionnaire using the beauty, self-esteem, and body image scale developed by John West. Once you have created a questionnaire, you then want to create a codebook to evaluate the data that you receive from the questionnaire. Finally, step four you can actually begin to conduct field research. In following my previous example, I would randomly select a sample of the Spelman student body population. I am going to give out 65 questionnaires, with a goal of at least getting 50 of them back. Once I have all of the questionnaires completed and returned, I can begin the fun part. Some possible issues to watch for when collecting data are the availability and reliability of research participants, recall bias, which is the participants inability to remember information, and ethical issues. Some issues that may arise in data entry are missing data, leading questions, and converting data from an excel spreadsheet to PASW format. When importing data from spreadsheet to PASW format, there is no one full proof method but the subsequent steps are effective:
1. 2. 3. 4. Close the Excel file Arrange the data in a rectangular grid Don't mix strings and numbers. Put descriptive names in your first row.

Excel should be closed before you try to import into SPSS. (Figure 1-1)

Open SPSS and select FILE | OPEN from the menu. (Figure 1-2)

(Figure 1-3)

Make sure that you have the correct number of variables (columns) and cases (rows). (Figure 1-4)

This is what the SPSS data window should look like. (Figure 1-5)

Data cleaning is used to ensure that a set of data is accurate and precise. During data cleansing, records are verified for correctness and reliability. They are then either corrected or deleted if necessary. This method can be used on single or between multiple data sets. Data cleansing involves a person(s) or computer working through a set of documents and verifying their precision. This is where mislabeled data is properly labeled and filled, typos and misspellings are corrected, and missing data are completed. Data cleaning is also used to get rid of out-of-date information or unrecoverable records and free up space. Data cleansing is important to the proficiency of data dependent programs. Its goal is to minimize errors, and make the data as useful and meaningful as possible. How to check for errors: Frequency: (Analyze Descriptive Statistics Frequencies) The simplest way to scan your data for weird stuff is to check the frequencies for each variable. This lets you quickly see if anything unexpected entered the data set. Data Entry into SPSS begins by defining the names and properties of the variables. Then enter the specific values into each variable for independent sources of data. There is one row for each independent data source and one column for each variable or characteristic that has been measured. Entering the Data: The first step for entering the actual data is to click on the Data View tab. To enter new data, click in an empty cell in the first empty row. The "Tab" key will enter the value and jump to the next cell to the right. You may also use the Up, Down, Left, and Right arrow keys to enter values and move to another cell for data input. To edit existing data points (i.e., the change a specific data value), click in the cell, type in the new value, and press the Tab, Enter, Up, Down, Right, or Left arrow keys. Here is an illustrated view of what it should look like:

(Figure 1-6)

Columns to know: Name: the name of each PASW variable in a file, each must be unique. Ex: Sum Dog_Rat Type: the main two types of variables used are numerical and string. Numerical may only have numbers assigned but string variables may contain letters or numbers. Width: the number of characters that PASW will allow to be input for the variables. Decimals: the number of decimal places PASW will display Label: used in text, to describe in better detail what a variable represents. Values: can be used for categorical data to know which numbers represent which categories. Clicking here opens up the Value Labels dialogue box.

(Figure 1-7)

1. Click in the Value field to type a specific numeric value 2. Click in the Label field to type the corresponding label 3. Click on the Add button to add this pair of value and label to the list

Chapter 3 Uni-variate Analysis & Frequency Distribution


Descriptive statistics establish associations between variables. They provide simple summaries about the sample and the measures. In descriptive statistics, you are describing what the data shows. These statistics are used to present quantitative descriptions in a manageable format. Descriptive statistics are used to help simplify huge quantities of data. For example grade point average, GPA, describes the overall academic performance of a student although he/she has taken a wide variety of courses. Descriptive statistics do have their limitations but they provide an influential abstract that is capable of facilitating comparisons athwart people and other units of analysis.

A Frequency Distribution is an organized tabulation of the number of individuals located in each category on the scale of measurement (Gravetter, 2008). It presents a picture of how the individual scores are distributed on the measurement scale. A frequency distribution takes a disorganized set of scores and places them in order from highest to lowest, grouping together all individuals who have the same score. For example, if the highest score is X=8, the frequency distribution groups together all 8s, then all the 7s, then all the 6s, and so on. So a frequency distribution allows the researcher an at a glance view of the entire set of scores. It depicts whether the scores are low or high, if they are concentrated in one particular area or spread out across the scale, an usually offers an organized illustration of the data. It also allows you to see the location of an individual score relative to all other totals in the group. Frequency distribution can be illustrated in a number of ways but it must contain two things: 1. A set of groups that frame the original measurement range. 2. A documentation of the number of individuals in each group.

Examples: (Figure 1-8) (Figure 1-9)

The images depicted above are two examples of frequency distribution. Figure 1-8 is a frequency distribution table depicting the age of individuals and the percentage of people in that category. The Figure 1-9 is a frequency distribution bar chart, it is the same information as Figure 1-8, just illustrated differently.

Measures of Central Tendency The central tendency of a distribution, is the estimate of the middle of a distribution of numbers. The three major types of central tendency are: 1. Mode: the most frequently occurring number(s) in a set of values. a. Order the numbers in a set of values: Ex: 2, 2, 3, 4, 8, 8, 8, 8, 10, 13, 14, 14, 14, 18, 19, 20, 21, 21 b. Count each value and the most frequently occurring number is the Mode. Ex: 8 is the mode 2. Median: the score found exactly in the middle of the set of values. a. Order the numbers in a set of values: Ex: 2, 2, 3, 4, 8, 8, 8, 8, 10, 13, 14, 14, 14, 18, 19, 20, 21, 21 b. The number that is directly in the middle is the Median Ex: 10 and 13, are both middle numbers so the average of these two numbers equal the median. i. 10 + 13 = 23 ii. 23/2 = 11.5 iii. 11.5 is the median 3. Mean: the average of all numbers in a set of values. It is the most common central tendency. a. Add up all of the values

10

Ex: 2 + 2 + 3 + 4 + 8 + 8 + 8 + 8 + 10 + 13 + 14 + 14 + 14 + 18 + 19 + 20 + 21 +21= 207 b. Divide the sum of the values by the number of values in the set of data. Ex: 207/18 = 11.5 is the mean

Dispersion is the spread of data around the central tendency. Also referred to as statistical variability, it is a real number that begins as zero if all the data are identical, and increases as the data differs. It cannot be less than zero. The most come measures of dispersion are; standard deviation, range, and variance. 1. Range: the smallest interval that contains all the data. It is calculated by subtracting the largest value in a dataset from the smallest value. We will use the same numbers from the previous examples. a. Ex: 21-2 = 19 is the range 2. Standard Deviation shows the amount of variation each number in a set of data is from the mean (average). Standard deviation can be calculated by: a. Subtract the mean from each number in the data set Ex: 2 - 11.5= - 9.5 2 - 11.5= - 9.5 3 - 11.5= - 8.5 4 - 11.5= - 7.5 8 - 11.5= -3.5 8 - 11.5= -3.5 8 - 11.5= -3.5 8 - 11.5= -3.5 10 - 11.5= -1.5 13 - 11.5= 1.5 14 - 11.5= 2.5 14 - 11.5= 2.5 14 - 11.5= 2.5 18 - 11.5= 6.5 19 - 11.5= 7.5 20 - 11.5= 8.5 21 - 11.5= 9.5 21 - 11.5= 9.5 b. The resulting number,( +, -) are the standard deviations from the mean. The next step is to square each standard deviation. This is called the sum of squares. Ex: - 9.52= 90.25 - 9.52= 90.25 - 8.52= 72.25 - 7.52= 56.25

11

-3.52= 12.25 -3.52= 12.25 -3.52= 12.25 -3.52= 12.25 -1.52= 2.25 1.52= 2.25 2.52= 6.25 2.52= 6.25 2.52= 6.25 6.52= 42.25 7.52= 56.25 8.52= 72.25 9.52= 90.25 9.52= 90.25 732.50 c. Now that we have found the sum of squares to be 732.50, we divide this by the number of values in the dataset, which is 18 minus 1. Ex: 732.50/17= 43.0882 (variance)

d. To find the standard deviation value, you take the square root of the variance value 43.0882. Ex: SQRT(43.0882)= 6.56 is the standard deviation.

3. Variance: is the average of the squared differences from the mean. It has been illustrated in the examples above.

12

Chapter 4 Graphing and Charting Data:


Graphs, charts, and tables are often used in statistics to visually convey data. These illustrations are typically the first step in evaluating raw data to find outlying values, trends, and data entry errors. An appropriate graph or chart can answer many questions in a small amount of area and suggest further considerations. The purpose of creating charts, tables, and graphs is to present data accurately and clearly so that it is easily understandable. These illustrations are useful for highlighting the shape of a distribution and conveying case clusters in a range of values. The massive amount of computer graphic programs available has greatly made creating the illustrations easier. The most commonly used programs to depict raw data information are Microsoft Excel and PASW. Types of Charts and Graphs: 1. Bar Graphs are usually rectangular bars with proportional lengths to the value it indicates. These bars are plotted vertically and horizontally. It is useful if you are trying to record certain information whether it is continuous or not. Bar graphs are effective ways to display the relative frequencies for two or more categories of a variable when you want to emphasize comparisons. Bar charts are often mistaken with histograms, which we will discuss later in this chapter. The main difference between the two is that, in bar charts, each column represents a group defined by a categorical variable; and with histograms, each column represents a group defined by a quantitative variable. In bar graphs, the x-axis does not have a low end or high end; because the labels on the x-axis are categorical, not quantitative.

13

Ex: (Figure 1-10) 2. Pie Charts are used to present the proportion of cases in categories of variables. Each slice of the pie represents the proportion of cases in that category: the bigger the slice of the pie is, the larger the relative size of that category is. A pie chart can be used to examine variables at any level of measurements. Ex: (Figure 1-11)

14

3. Line Charts use a dot to represent the midpoint of each interval. Then lines are used to connect the dots. Line charts are a great way of illustrating trends across time. Line graphs can use multiple lines and axes to convey a large amount of information in a compact space. Ex: (Figure 1-12)

4. Histograms show the distribution of data. It is an estimate of the probability distribution of a continuous variable. It consists of tabular frequencies, shown in rectangular form, over discrete intervals, with an area equal to the frequency density of the interval. It shows the proportion of cases that fall into each category with the total equaling one. Histograms are used to plot the density of data. Ex: (Figure 1-12)

15

5. Scatter Plot is a diagram used in mathematics to display values for two variables in a set of data. Displayed as a collection of points, each dot represents a position on the horizontal axis and the other variable determining the position on the vertical axis. The independent variable is plotted on the x-axis and the dependent variable is plotted on the y-axis. A scatter plot can be used to determine the degree of correlation between two variables. It can suggest correlations between variables with a certain confidence interval. These correlations can be positive, negative, or uncorrelated. A positive correlation slopes from low left to high right. Its a negative correlation if it slopes from high left to low right. Ex: (Figure 1-13)

16

Chapter 5: Bivariate Analysis of Data


The relationship between two variables is known as a Bivariate relationship. Regression and correlation analysis is used to determine both the nature and the strength of a relationship between two variables. Correlations tell us how well the estimating equation actually describes the relationship. It tells us the direction and strength of relationships between two variables. The symbol for sample correlation coefficient is r, and the symbol for the population correlation coefficient is . The likelihood of r being close to (+, -) is very rare. If r is approximately (+,-) .30, then it is considered to be important.

-1.0
Perfect Negative Relationship

0
No Linear Relationship

+1.0
Perfect Positive Relationship

An ordinal scale describes some sort of ordering between variables but not related to relative size or degree of difference between the items. It can include median and percentile statistics. In this scale, the numbers assigned to objects or events represent rank order. This scale also uses names to represent the order. Intelligence quotient is also considered ordinal. Ex 1: 1st, 2nd, 3rd, 4th, etc. Ex 2: Bad, Medium, Good, Very Satisfied, Neutral, Unsatisfied Ex 3: IQ 60, IQ 160, IQ 70 Ex 4: Restaurant ratings, Likert scales

17

An interval scale classifies variables into categories where one unit on the scale represents the same magnitude on the characteristic being measured across the entire range of the scale. For example, we measure anxiety on an interval scale, well 10 and 11 are essentially the same as 40 and 41. Interval scales do not have a true Zero points. Therefore it is not accurate to make conclusions about how many times higher one score is than another. Ex: Fahrenheit scale of temperature

To compute correlations in PASW: 1. Click Analyze Correlations Bivariate 2. A dialogue box should appear. Pick your two variables from the list that you want to analyze. Be sure to know what level of measurement your variables are so that you know which correlation coefficient to use. If your hypothesis is asking about the direction of the variables being tested, greater than or less than, be sure to check the one-tailed box. For all others, check the two-tailed box under Test of Significance. 3. This is how your correlation should look once it is analyzed. (Figure 1-14)

18

Chapter 6 Multivariate Analysis

Multivariate analysis involves the analysis of multiple variables at a time. This technique is used to perform studies across multiple dimensions while taking into account all variables being analyzed. It is concerned with data sets with more than multiple variables being tested at one time. Multivariate analysis uses multiple regressions to analyze multiple variables. There is only one dependent variable but there can be multiple independent variables. Regression analysis helps you to understand how the typical value of the dependent variable changes when any one of the independent variables is varied. It estimates the conditional expectation of the dependent variable given the average value of the dependent variable when the independent variables are held fixed. In all circumstances, the estimation target is a function of the independent variables. It is widely used for forecasting and predicting. It is also used to determine which independent variables are related to the dependent variable and explore the forms of these relationships. In some circumstances, regression analysis can be used to infer causal relationships between independent and dependent variables. Two types of Regressions are

In linear regression, data are modeled using linear functions to estimate the unknown parameters from the data. A linear function can determine if the mean of y given X is an affine function of X. This form of regression focuses on the conditional probability distribution of y given X. It can be used for forecasting and prediction. It can be used to determine which independent variable has the strongest relationship to the dependent variable and which doesnt.

19

Ex: (Figure 1-15)

Statistics uses logistic regression for prediction of the probability of occurrence of an event by inputting data to a logistic curve. It is a generalized linear model used for binomial regression. For example, a persons age, sex, and BMI can be used to predict their likelihood of having a heart attack within a specified time period. It is widely used in the medical field as well as the marketing field, to determine buying propensity.

This is the logistic function:

20

This is an example of a logistic regression graph:

(Figure 1-16)

This graph shows that the variable z represents the exposure to some set of independent variables, while f(z) represents the probability of a particular outcome, given that set of explanatory variables.

21

Chapter 7 Conclusion

Social Statistics are important to understanding human behavior in our social environment. As shown throughout this manual, there are many ways of testing and understanding the relationships between different factors associated with human behavior. Social scientist use social statistics for things like, evaluating the quality of services available to certain groups of people or organizations. They are used to help analyze different behaviors of groups of people in their natural environment as well as different environments or special circumstances. Social statistics are also helpful in establishing the wants and needs of people through statistical sampling. Predictive Analytics software is just one of the software programs, along with Microsoft Excel and SAS, that have helped to make researchers lives so much easier through their statistical analysis programs. I hope that this manual has been informative, and that it helps you to better understand some of the concepts involved in social science quantitative analysis, as well as how to work some of the programs that SPSS provides.

22

References
Healey, Joseph F. 2005. Statistics: A tool for Social Research. Belmont,CA: Thomson Wadsworth Weinstein, Jay Alan. 2010. Applying Social Statistics: An introduction to quantitative reasoning. Lanham, MD: Rowman & Littlefield. IBM SPSS Statistics Student Version 18.0 for Windows and Mac OS X. 2011. London: Pearson. Computer software. SPSS Statistics 17.0 Brief Guide. 2008. Chicago, IL. (http://www.spss.com) Simon, Steve. 2008. Importing Spreadsheet Data into SPSS. Childs Mercy Hospital and Clinics. Retrieved April 28, 2011. (http://www.childrensmercy.org/stats/data/excel.asp). o Figure 1-1 thru 1-6: http://www.childrensmercy.org/stats/data/excel.asp SPSS Basic Skills Tutorial: Data Entry. SPSS Basic Skills Tutorials. Retrieved April 29, 2011. (http://my.ilstu.edu/~mshesso/SPSS/data_entry.html). o Figure 1-7: (http://my.ilstu.edu/~mshesso/SPSS/data_entry.html) Stockburger, David W. Frequency Distribution Introductory Statistics: Concepts, Models, and Applications. Retrieved April 25, 2011. (http://www.psychstat.missouristate.edu/introbook/sbk07.htm). o Figure 1-8, 1-9, and 1-13: http://www.psychstat.missouristate.edu/introbook/sbk07.htm

23

Create a Graph. Kids Zone. Retrieved April 29, 2011 (http://nces.ed.gov/nceskids/createagraph/default.aspx?ID=6cddd6c73b91401 38df31e2fbe1eb65c). o Figure 1-10:

http://nces.ed.gov/nceskids/createagraph/index.asp?ID=9664f172 a1db4726aa74c9899e5fb106
o Figure 1-12:

http://nces.ed.gov/nceskids/createagraph/index.asp?ID=9bc5334c 02d342568aa6a97d2c330a11

24

You might also like