Introduction To SPSS - Research Methods and Statistics Handbook

RESEARCH METHODS & STATISTICS HANDBOOK
First Term Dr. Alison, Mr. Brent Snook
Table of Contents
Section I: Introduction.............................................................................................3 Section II: Practicals................................................................................................11 Section III: Extra Material......................................................................................38 Appendix: Basic Statistics.......................................................................................60 Timetable.70
SECTION I INTRODUCTION
Course Instructors
The instructors for this year will be, Brent Snook (Room 1.79), X, Y & Z. Our offices are on the second floor of the Eleanor Rathbone Building.
Computing Systems
The University Computing Services Help Desk (Brownlow Hill phone extension 44567) has a full advice and backup service should you need any information and help.
Computing Environments
Communication between computers and ourselves is mediated by operating systems that allow us to access the various programmes and packages in the University. The most usual environment, as the systems are known, is Windows. This is controlled mainly through pointing and clicking the mouse at various icons on the screen. Another environment is UNIX, which is similar to MS DOS in that the commands are typed rather than selected with the mouse. The reason behind discussing these different environments is simply that the various packages we will be using are stored in these environments.
Computers and Networks

Most computers act both as stand-alone machines, capable of independent use, and as networked machines, which rely on a central server. Generally speaking, we in Psychology use networked machines for several reasons. Three main networks are used to access the packages on the different environments: the PC Managed Network Service, the NT Managed Network and the UNIX System.
The Three Networks

Access to the networks is gained by logging on with your user name and password. You then have access to your own personal disk space (M: drive) at a central location that only you can read. You have separate disk space for both Windows 2000 and UNIX, so you can have two separate passwords to increase security, though your user name remains the same. At the end of a session, you must always logoff. Usually, when a computer is booted you have the option to go on Windows 2000. Once on the Network, you are in the MS DOS environment and can then use Windows or UNIX.
Computer Terminals
Virtually all computers upstairs in Psychology and in the Eleanor Rathbone Teaching Centre (ERTC) are networked to Windows 2000. Also, on the first floor in Psychology is another suite of computers in the Eleanor Rathbone Data Centre
(ERDC). There is a printer in there which you should use in preference to the ones in the department, though Computing Services doesnt look kindly on people sending huge printouts to their printers.
INSTALLING APPLICATIONS
There are a number of applications you need to install onto your account. On your screen you should have an icon labelled MNTS Applications. Double click on this icon. Now double click on All and you should get a screen full of icons. These are all of the possible applications that you can install onto your account. Each application is installed by simply double clicking on the application icon. Install the following applications: 1. 2. 3. 4. 5. 6. 7. 8. 9. Mulberry (e-mail) SPSS (version 10) Stanford Graphics (on L:INVPSY) Microsoft Office (Word, Excel) WS_FTP Netterm BR Journey Planner The various MDS packages (LIFA2000, UNIX SSA, MSA, POSA) Geographic packages (Dragnet)
Within the limited timeframe, the purpose is to familiarise you with the software that is available and to encourage you to start using it.
Registering on Windows 2000

Computing Services have all their documentation accessible through the World Wide Web. You can print off any of the documents once you set up the appropriate printer. The Computing Services handout will take you through all the basics of Windows 2000 including registering and changing your password. To register on Windows 2000, you can go to any computer terminal. There should be a Windows 2000 login screen. Type the word register in the username box and follow the instructions.
Setting up the ERDC Printer

You have the capability to print on a local printer (a printer that is actually attached to your machine) or a network printer (a printer which is attached to the network). Since we dont have enough printers for everyone it will be necessary to attach to a network printer. The printer which is probably most convenient is the one found in the Eleanor Rathbone Data Centre. The network printer queue for this printer is erdc-Queue.
Your are not restricted to just this printer but it is the closest and it is reserved for postgraduate students studying in the Eleanor Rathbone Building. To connect to one of the Universitys networked printers you need to do the following: From the Start menu choose Settings and Printers. Doubleclick on Add Printer. Highlight the option Network printer server and click on Next. Double-click on Netware Network. Double-click on Novell Directory Services. Doubleclick on Liv. Doubleclick on O=liv. Scroll down the list of options until you see OU=PRINT-QUEUES Doubleclick on this option. From the list detailed in Figure 1, select the required printer queue and double click on it (in this case, erdc-Queue). Choose OK to install the required printer driver on your local machine. From the list illustrated in Figure 1, select the required printer manufacturer and then select the printer from the list available (it should be an HP LaserJet 4Si/4Si MX PS). Click on OK. If this is the first time you have installed this particular type of printer then you will be asked for the location of the files to install. Replace the line D:\i386 (this might say A:\i386) in the box Copy files from with the path line V:\NT40\i386 and click on OK. After a few seconds you will be asked if you wish to make this your default printer. Click on Yes. Click on Next. You will then receive a message that your printer has been successfully installed.
Click on Finish. The printer driver will now be installed and connected to the specified printer queue. Remember, once you have connected and installed a network printer driver, you may need to check the printer settings to ensure that the settings such as paper size and duplex printing are correct. For further information, please see Configuring the Printer Settings (on the Computing Services web page).
Using UNIX
A lot of your time this year may be spent in UNIX. We will go into more detail about this system in the section on UNIX. Three versions of the MDS procedures (SSA, MSA & POSA) are on UNIX. Double click the Netterm icon. Login with the same username as the PCMNS. Your password is listed on the form Computing Services sent you. You can change the password by typing in passwd.
The versions of the MDS packages on the mainframe have a number of advantages over the two non-Windows PC versions. They are basically more powerful and therefore more effective. A second feature is flexibility. In comparison to the mainframe SSA, ShyeSSA (PC package) has only two choices of measures of association - Pearsons and Guttmans Mu. While PAP offers the widest variety of measures, the copy we have also tends to be the one that doesnt work. Well go into the PC packages in more detail at a later date. Running the mainframe packages is fairly straight forward, but there are actually four parts to the whole process preparing your data, uploading/downloading, using the UNIX system itself, using the ned editor and running (in this case) the SSA. The SSA package has an option for reading data as freefields (any space between numbers indicates separate variables) for UNIX SSA, we can leave the data file as is. For MSA and POSA, the fields must be fixed, so: a) you dont want any spaces in your rows b) you need to have each score for each variable to start in the same column, otherwise, when the MDS programs read your data file, they wont be reading the variables properly - you tell the computer where each cases score for each variable is located by indicating the columns that variable occupies. Both of these will become apparent when we get to running the SSA. A simple example: 12 42 131213 13122 71111 In this case, the second variable actually takes up three columns, but, obviously, the computer does not stick a 0 in front of a score like 42 - thats your job. The same
holds true for the third variable, which requires two columns as the score are over 9. The correct version of the two lines of data, with 0s added and spaces removed should be: 12042131213 13122071111 Where columns 1-2 are variable 1, columns 3-5 are variable 2 and so on. After correcting the data, you then save the file again, making sure that it is still being saved in the generic ASCII format.
Uploading/downloading files
The best way to upload (transfer files from the PC/NT Managed Network to the mainframe) or download (vice versa) is to use the WS-FTP application (in the comms window). This is a simple package to use. 1) When you first start it up, a window comes up asking for information about a host - this is the location youll be accessing outside of the PC to transfer information. Under host name type UNIX. Under host type, select the option UNIX (standard). Under UserID put your user name. Under password enter your UNIX password. 2) The format is simple. On the left-hand side is the local host, through which you can change between directories and drives. The right-hand side is your remote host (UNIX account), which also has directories you can move through. At the very top of each, your current drives/directories are listed. To transfer a file up, you select it and hit the right arrow. To transfer down, you hit the left button. The only trick is to make sure you have it set up for the right receiving directory, e.g. selecting the winword directory on your M: drive to receive the results file from an MSA that youve run. So, locate the coding.dat data file and move it from the M: drive to your UNIX account. 3) When youve finished, hit the exit button.
The UNIX Operating System

Imagine that UNIX is set up like the file manager in Windows, but that you have to type in commands rather than click the mouse to move up and down directories, copy files, delete files and so on. When you first login in, the info on the left of the dollar sign prompt indicates the user, the particular machine youre on (in brackets) and what directory you are in - from left to right. Remember that UNIX is case sensitive an F is not the same thing as an f. I find it very useful to start all directories with a capital letter and files with a lower case one to separate them. The first thing to know is how to find help. This is done through the man command. If you know the particular command you want help for, just type man {command}. If you have an idea as to what type of action you want the computer to carry out, but dont know the specific command, type man -k {keyword} where the keyword is something related to the command, e.g. password, to get a list of commands with something to do with passwords.
Here is a short list of UNIX commands: cp {directory/filename} {newfilename}- copies a file rm {filename}- deletes a file mv {filename} {directory} - moves a file/renames a file ls - lists the contents of the current directory cd {directory} - change directory (note that cd .. moves you up one directory) mkdir {directoryname} - creates a directory rmdir {directoryname} - removes a directory ned {filename} - activates the ned editor pine - e-mail editor gopher - info source tin - newsgroups reader Note that * is a wildcard character, just like in Windows, for selecting multiple files for commands. All of this information is available in more detail on the WWW.
The ned editor

I mentioned this before (briefly), but I thought Id go into this in more detail, as its a useful tool for editing files when you are on UNIX. All of these details are in the document on the ned editor on the Computing Services on-line documentation, BTW. Essentially, its a crude word processor, where all the function keys have various...functions...as do shift-function keys. None of this silly bolding or italics, no sir. You can type, you can move your cursor around and you can find and replace. Word processing for real men. Anyway, on to the lesson. To start up the ned editor, you have to edit a file on the UNIX account. You do the latter by typing ned {filename}, so choose any filename and open the ned editor. A screen will come up with a brownish banner along the bottom listing the various function key options. All the basic keys operate like in Word: arrow keys move the cursor around, the home key moves to the start of the line, and end to the end. Page up/down are also the same. As are insert/typeover, delete, backspace and so on. Right, now type out everything from I mentioned this... on to right here. Hit the F1 key, to get info on one of the displayed topics, move the cursor to it and hit enter. For a function key, hit that key. Ctrl-G exits help. Right, position your cursor at the start of the third line from the bottom, then hit the F2 key - a new line. Now hit F9, which will delete the line you just created. Now hit shft-F9, and the line comes back. Right, now hit the F4 key to mark the start location for cutting/copying and pasting. Move the cursor somewhere on the next couple of lines then hit F6. Move to the end of the document and hit enter a couple of times, then hit F5. All the text between the marking point and the cursor is copied to the new cursor location. The same process is carried out for cutting text, but you hit shftF6 instead of F6.
To insert text from another file, hit shft-F7. Ned will ask for the filename, the text of which will be inserted wherever youre put the cursor. Shft-F4 saves the file, while shft-F3 saves the file and exits. F3 quits without saving changes. However, the most important feature of ned for data files that youve uploaded is the replace feature (F8). With this, you can change 1s to 0s and so on. So, lets change the letter e to i. Move to the top left of the document. Hit F8, then type the letter e [DONT hit enter now]. Hit F8 again, and type i. Hit F8 one more time. Youll be prompted to make a choice about the first e. If you hit the Y key, it will change it, N will make the computer jump to the next occurrence. A ! will cause ned to make all possible changes. Youll find this handy for changing numbers prior to doing analyses. For example, changing 0s to 1s and 1s to 2s.
10
SECTION II PRACTICALS
11
WEEK 1: Thursday October 3th Introduction to SPSS

SPSS is the primary package for running any statistical procedures outside of the MDS packages. In addition to providing outputs for various analyses, SPSS allows the user to manipulate the data in a variety of ways and to produce various graphs and figures that can be added into documents. In this practical, you will be asked to open and search through a data matrix, and enter and code data. The procedure for the exercises in this practical involves going through the steps for each analysis using the data file family.sav. Where is Family.sav? The first thing you must do is copy family.sav from the N: drive on your computer to the M: drive (which is your own personal account). To do this you must create a folder on your M: drive into which the family.sav file will go. You should be looking at a screen with a number of icons on it. In the top left-hand corner is an icon called my computer. Double-click on this icon. Find the M: drive and double-click on it. You should now see a window containing a number of folders. Go to FILE, then NEW and choose FOLDER. A new folder should appear in the bottom of the window labelled New Folder. Call your new folder Survey and ENTER. After you have done this, go to FILE and then CLOSE. Now, within the same window double-click on your N: drive. Within that drive you will see a folder with title SPSSEGS (standing for SPSS example files). Double-click on this folder. Within this folder there is a file labelled family.sav. This is the file you want to copy into your Survey folder on your M: drive. So, single click on family.sav and go to EDIT and then COPY. Go back to your M: drive by shutting down the N: drive. (click on the X in the right hand corner of your N: drive window). Double-click on your M: drive and doubleclick on the folder Survey. Survey should be empty. Go to EDIT and then PASTE. Now you should see the file family.sav.
Exploring the Data Editor Window

Start SPSS for windows by double-clicking on the SPSS icon. Once the program has been opened a window will appear in the middle of the screen with a number of options to choose from. You want to select OPEN AN EXISTING DATA SOURCE. Go to the directory Survey in your M: drive. Find the file family.sav and doubleclick on it. The values from the family.sav file should now appear in the Data Editor window. Click on the middle button in the top right hand corner of the window to maximise the size of the window. Once the file is open you will see two sheets at the bottom of the window. One is labelled DATA VIEW and the other is labelled
12
VARIABLE VIEW. You want to stay on the data view sheet. Click on the VALUE LABELS (in bold rectangle below) button on your tool bar (it is 2 nd from the right). This will toggle between value labels (numeric and string (words)). Scroll through the data to answer the following questions:
1. What is the name of the last variable in the data matrix? 2. What is the case number of the last case? 3. What is the value of IDNUM for the last case? 4. What is Roberts date of birth? 5. What is Jacks marital status? If you click on a cell when value labels are displayed in the DATA VIEW WINDOW a scroll bar will appear to provide an indication of the options (variable labels) used in the coding framework. Using this feature, please answer the following questions: What are the labels for CAR? What are they for MORTGATE? What are they for NAME? Is there a problem with NAME? What is it?
13
The variable view sheet

In order to view how a variable has been defined in terms of its name, variable label, value labels and user-missing values you have to click on the sheet VARIABLE VIEW.
Click on this Sheet
Please answer the following questions. Do not forget to use the scroll bars on the bottom and on the right side of the variable view window to find your answers. What is the variable label for DATEBLT? What are the values and value labels for MARSTAT? (hint: click on the grey box) What is the user-missing value for NCARS?
14
Coding and Entering Data

Open up a new Data Editor window by going to FILE, then NEW and save DATA to M: drive. Below is a questionnaire regarding leisure activity and a coding scheme. Your task is to set up the Data Editor Window and then enter the data below. Leisure Activity Questionnaire 1. What is your first name? 2. What is your sex? M = male, F = female 3. What is your marital status? 1 = married 4 = widowed 2 = cohabiting 5 = divorced 3 = single 6 = separated 4. 5. 6. 7. 8. Do you watch sports? Do you play sports? Do you visit the seaside? Do you go to films? Do you go pop concerts? 1 = yes 1 = yes 1 = yes 1 = yes 1 = yes 2 = no 2 = no 2 = no 2 = no 2 = no 3 = do not know 3 = do not know 3 = do not know 3 = do not know 3 = do not know
Coding Framework
Variable Name IDNUM NAME SEX AGE MARSTAT WATCHSP PLAYSP VISITSEA GOTOFILM GOTOPOP Format NUMERIC STRING STRING NUMERIC NUMERIC NUMERIC NUMERIC NUMERIC NUMERIC NUMERIC Variable Label IDENTIY NUMBER FIRST NAME SEX AGE IN YEARS MARITAL STATUS WATCHES SPORTS PLAYS SPORTS VISITS SEASIDE GOES TO FILMS GOES TO POP CONCERTS Coding Details/Labels Unique Number for Each Person Enter First Characters of Name M = male F = Female Enter age in years (-9 = Missing) 1=married 4=widowed 2=cohabiting 5=divorced 3 = single 6 = separated 1 = yes 2 = no 3 = do not know 1 = yes 2 = no 3 = do not know 1 = yes 2 = no 3 = do not know 1 = yes 2 = no 3 = do not know 1 = yes 2 = no 3 = do not know
Data
IDNUM 101 201 202 301 503 1002 NAME MARGARET JACK JOSIE NANCY VICTORIA JOHN SEX F M F F F M AGE 87 62 MARSTAT 4 1 1 5 -9 2 WATCHSP 1 2 1 2 1 PLAYSP 2 2 2 1 3 VISITSEA 1 1 1 1 1 GOTOFILM 2 2 2 1 1 GOTOPOP 2 2 2 3 1
60 11 31
You should have a clean window in front of you (i.e., there should not be any data in the spreadsheet). You now have to set up each column of your data matrix so that you can eventually enter in your data. The first column will hold IDNUM. To enter IDNUM into the data view sheet you need to go to the VARIABLE VIEW window.
15
In fact, defining and labelling all of your variables must be done in your variable view sheet. In the first Row (horizontal) you can label and define your first variable IDNUM. Using the coding framework above enter in the appropriate information. Type in the variable IDNUM under NAME. The TYPE of variable is NUMERIC (you are entering a number) and under DECIMALS, using the scroll bar, choose 0 decimal places. Under the heading LABELS you want to type in the definition of the variable. Make sure this definition clearly defines the variable to avoid confusion. Depending upon the type of data (i.e., nominal, ordinal, ratio, or interval) you are measuring you may have to add VALUES. In the case of IDNUM (identify number) there is only one unique number, therefore you do not have to define the variable. So, under VALUES, you should have chosen none. However in defining nominal data such as SEX (your third variable to enter) you would have to define male as M and female as F. For IDNUM there are no missing values therefore you choose none. The heading COLUMNS will give you the opportunity to define the width of your column. Choose a width of 6. The ALIGN value allows you to determine the positioning of your data in the cell. It may be right, left or centred. In the last column heading is MEASURE. This column allows you to define the type of data you are working with. With IDNUM you are working with scale data. When you define variables such as NAME (i.e., the name of the subject), you want the TYPE of variable to be STRING, the WIDTH should be 10 (refers to the number of characters to appear in the name). Using the coding framework below define the variable NAME. When you define variables such as sex (nominal data) you want to add value labels in the column called VALUES. If you click on the cell a value labels window will appear. Across from value you should type your value M and across from the value label type male and then click on add. Then you should enter F in the value box and female in the value label box. Once you have made these changes you can move back to the DATA VIEW window and view the changes. Return to the VARIABLE VIEW window and define the numeric variable AGE in the next row. It has no decimal places, and it requires a missing value of 9 to identify cases where a response is not given. To assign a user-missing value of 9 click on the MISSING column. A missing values window will appear. Click on Discrete missing values and enter 9 in the first box. Set up a variable label and a value for 9 as shown in the coding scheme for your questionnaire. Now, do the same for the numeric value MARSTAT in the next row. This too is numeric with no decimal places, has a user-missing value of 9 and requires a variable label and several value labels as shown in the coding scheme. The remaining 5 variables also need to be defined. To avoid defining each variable separately you should define the first variable WATCHSP and then copy the cells to the remaining four below. To do this go to the cell you want to repeat (i.e., the value
16
labels) and click on EDIT, COPY and then move to the cell where you want the same definition and then go to EDIT and PASTE. When you have finished entering all of the data save it into an SPSS file by selecting FILE, SAVE and clicking on the folder Survey in your M: drive. Save the file under any name you want (e.g., Person.sav). Exit from SPSS and log off.
17
WEEK 2: October 10th Descriptive Statistics, Charts & Manipulating Data in the Matrix
This practical is divided into two sections. The first section is intended to familiarise you on how to run commands to calculate descriptive statistics and to graph your data. The second section aims to show you how to compute re-code, filter and delete your data.
Section I: Descriptive Statistics & Charts

We shall estimate descriptive statistics for the three variables: TYPACCM, DATEBLT, & NADULTS. Question: Are these variables nominal (non-ordered categories), ordinal (with ordered categories) or metrical (on a measure scale with well-defined differences between values)? Hint: The second variable is not so obvious. To run the descriptive statistics click on ANALYZE, DESCRIPTIVE STATISTICS and then FREQUENCIES. In the left box there should be a list of all the variables that are present in the spreadsheet. Highlight TYPACCM and click the arrow between the boxes to move it into the box labelled variables. Continue this for the other two variables. A shorter route to move the variables to the variables box would be to double-click on the variables when they are in the left box - removing the variables may be accomplished in the same manner. After the three variables are in the variables box, click on STATISTICS at the bottom of the box. Within the Frequencies: Statistics box there are several options. Tick the boxes for MEAN, MEDIAN & MODE on the right hand side. In addition, tick the boxes for STANDARD DEVIATION (Std. Deviations) & RANGE. After, click on the continue button and wait for the data to process and for the output window to appear. Answer the follow questions: What is the most useful measure of central tendency for each of the three variables? What are the sample values? What is the maximum value for NADULTS? Does this appear to be correct? Now, try re-estimating the descriptive statistics for NADULTS, only this time without the case with the unusual value. Select DATA and then SELECT CASES. Within the Select Cases make sure under the Unselected Cases that the Filtered box is ticked. Then select the IF CONDITION IS SATISFIED option and click on the IF button. Move the variable NADULTS to the adjacent box by either double-clicking on it or by clicking on the variable and moving it across using the arrow. After the variable label use the calculator provided to type less than (<) the value of
18
the unusual variable. After this hit continue and then OK to return to the spreadsheet. Answer the follow questions: Has the case with the unusual value been barred off? Which case is it? Now, re-run the Frequencies command for NADULTS only and record the mean, median & mode with and without the case included. Which descriptive statistic is most affected by the unusual variable?
Graphing your Results

Histograms Histograms are statistical diagrams that show the distribution of variables. In a histogram, values are grouped together in intervals and a bar is drawn for each interval whose area is proportional to the number of cases in the interval. To generate a histogram select GRAPHS and HISTOGRAM Then move the variable HEIGHT into the variable box. In the same box, click the display normal curve box and then hit OK. Upon examining the output window that contains the graph answer the following question: Do you think HEIGHT has a normal distribution, or would you run other tests?
Go back to the data editor window, select GRAPHS and HISTOGRAM and run the same command as done using the HEIGHT variable but with WEIGHT. From the histogram, would you say that the variable WEIGHT has a normal distribution or would you try other tests? Are there any differences between the two histograms? Scatter plots Scatter plots show the joint behaviour of two (or more) variables in a diagram. Values of one of the variables are plotted against values of another, the two variables usually being metrical. A scatter plot usually shows much more about the behaviour of the variables than descriptive statistics like correlation. Scatter plots are also drawn using the GRAPHS command. Click on GRAPHS then
19
SCATTERPLOT then on the SIMPLE option and then click on the DEFINE button. Select WEIGHT for the Y-axis and HEIGHT for the X-axis. In a scatter plot, if one of the variables is thought to depend on the other, it is plotted on the vertical Yaxis. Here, we think that weight depends on height, therefore, weight is plotted on the Y- axis. In addition, select SEX for select markers by. This will allow you to identify points on the scatter plot by sex, as males and females tend to have different heights and weights. Run the command and look at the scatter plot in the chart carousel window. Can you see any difference between the males and the females in terms of heights and weights? To edit the chart simply double-click on it. Now we shall try fitting simple linear regression lines to the data. Select CHART then OPTIONS and FIT LINES (Select Subgroups) and FIT OPTIONS. Make sure linear regression has been highlighted and then click-on continue. There should be two different lines for males and females. What can you say about the slopes of the two regression lines? Can you see any difference now between the males and the females in terms of heights and weights? The markers used to distinguish males and females are drawn in different colours, but the difference is not very clear. It will become less clear if you print out the scatter plot on a monochrome printer! Click on any marker in the plot: all markers of that sex become highlighted in black squares. Then click on the icon depicting a crayon/pencil to change the colour of the marker/symbol. To change the symbol simply click on FORMAT and then MARKER. There you should have several options of changing the type and size of the symbol. After making the chosen changes hit Apply and Close.
Editing a High Resolution Chart

Generate a high-resolution chart, a histogram, to try out some of the editing features. Histograms are used for metric or quantitative variables, like AGE, which takes on values along a scale. There are generally too many distinct values to make it worth drawing a bar chart. Instead, the values are grouped into intervals or bands and a bar is drawn for each interval. The area of each bar is proportional to the number of cases with values in the interval. Still using family.sav select GRAPHS and then HISTOGRAM. Select HWRATIO for the variable box and click OK. A histogram for HWRATIO is added to the Chart Carousel Window. The histogram shows some descriptive statistics for the variable too. What are the sample mean and standard deviation for HWRATIO?
20
Double-click on the chart to move the histogram from the Window to a Chart Window. The menu bar and tool bar editing facilities.
Chart change
Carousel to show
First, click on CHART then OPTIONS and NORMAL CURVE - then hit OK. The normal curve superimposed over the histogram is the one for the above mean and standard deviation. Admittedly, its difficult to make a decision with such a small sample, but does the curve appear to be a good fit to the histogram? Now, click on the icon swap axes. Does the histogram look better with vertical bars or horizontal bars? Now try some of the other icons and tools to change the chart. These changes require the appropriate part of the chart to have been selected. Click on any bar. The bars will become highlighted with small black squares at their corners. Then click on the Fill Pattern - tool button (the rectangle with diagonal shading). To apply a pattern, click on it and then click on apply. Once you have finished with the patterns, click on close. Also, try the Colour Palette tool button (the one with the pen) and the Bar Labels icon tool button (the one with the fingernails). You can also change the style of the line showing the Normal curve, and the fill pattern and colour of the background of the histogram. Once you have finished with your work, select FILE and then SAVE CHART. Save your histogram as artwork.chz To copy or move a chart into Word click on EDIT and then select COPY the chart. To move to Word minimise SPSS and open word. If Word is already open then press ALT & TAB to move between programs. Once in Word, go to EDIT PASTE. Finally, exit from SPSS for windows by selecting FILE EXIT
Section II: Manipulating the Data in the Matrix (Computing, Recoding, Filtering and Deleting Data)
Computing Values
Start off SPSS and open the file family.sav (you should find this file on your M: drive in the folder that you named survey). We shall use the COMPUTE command to build up a new variable that will be labelled BMI, which stands for body mass index. This is calculated as: Body mass index = weight (pounds)/ height (inches)2 Select TRANSFORM and then COMPUTE and set the Target Variable to bmi. Click on Type & Label and enter the label body mass index in the label box. Click continue to return to the Computer Variable dialog box. Using the source list on the left and the calculator pad in the centre, build up
21
Weight * 0.4536 / (height * 0.0254) **2 in the numeric expression box. Run the completed command. The new variable is added to the end of the data. We shall check the new variable by estimating a few descriptive statistics using FREQUENCIES (via Analyze Descriptive Statistics). (Analyze Descriptive Statistics Explore would be a better command, but Frequencies will do here). Select ANALYZE, DESCRIPTIVE STATISTICS and then FREQUENCIES. Move body mass index (bmi) to the Variable(s) box. Since bmi is a metric variable with a potentially different value for every case in the data suppress frequency tables by clearing the check box. Click on DISPLAY FREQUENCY TABLES. Now you will get a message saying You have turned off all output. Unless you request Display Frequency Tables, Statistics or Charts, Frequencies will generate no output. No worries, we will estimate descriptive statistics by clicking on STATISTICS and clicking on the check boxes for the following: MEAN, MEDIAN, MINIMUM and MAXIMUM. Run the command and look at the output. What are the sample values of the mean, median, minimum and maximum?
(The mean should be around 25.0. Any values outside the range15.0 to 35.0 should be queried). Do the sample statistics satisfy these rough checks? If not, something is wrong!
Conditionally Computing Values

Now we shall use the IF sub-command (via Transform-Compute) to set up a new variable. The sub-command allows you to set up a new variable under the condition that the original variable, which it is based on, fulfils certain criteria. We want to set up a new variable AGEHOH for the age of the head of the household. In other words, If a person in the sample is head of the household, AGEHOH shall indicate that persons age. Select TRANSFORM and then COMPUTE and clear the previous settings by clicking on RESET. Set the Target Variable to AGEHOH and click on TYPE & LABEL to assign the label age head of household. Click on Continue, and then set the Numeric Expression to AGE. We want this (i.e., the current age in years) to be applied when the case is head of household, which occurs when RELTOHOH is zero. (For the variable RELTOHOH relationship to head of household the value 0 denotes that a person is head of household). Select IF and INCLUDE IF CASE SATISFIES CONDITION. Set up the condition RELTOHOH = 0 in the large box and run the command. The variable AGEHOH should now be added to the end of the data. Have a look at the new variable. You should see ages set for some cases only. Lets check AGEHOH by moving it in the data matrix to the column after RELTOHOH so that we can see what happened more clearly.
22
First we must make a space in the data matrix by inserting a new variable. Find RELTOHOH by either scrolling through the DATA EDITOR window or by selecting UTILITIES and VARIABLES. selecting RELTOHOH from the source lists and then clicking on GO TO and CLOSE. Now click on any cell of the variable that is immediately to the right of RELTOHOH (this variable should be sex). Then select DATA and then INSERT VARIABLE. Alternatively, you can click on INSERT VARIABLE tool (which is the sixth button from the right). Now, a blank column headed var00001 containing system-missing values (dots) is inserted before the selected variable. Move the AGEHOH to this column by singleclicking on AGEHOH to highlight the column and then selecting EDIT and CUT. To paste it in the desired location single-click on the head of the blank column (var00001) and select EDIT and then PASTE. Look at the values in the DATA EDITOR window. Do all heads of household have AGEHOH set? If not, what might be the reason? (Hint: Look at the variable that agehoh is derived from!). What value is set for cases who are not heads of household?
Re-coding Values
The RECODE command in SPSS is very powerful and efficient but it can be a little tricky to set up due to the number of clicks required. We shall recode BMI into a new variable BMIGRP, which takes the values Value 1 2 3 Range bmi < 25.0 25.0 bmi < 30.0 bmi 30.0 Interpretation Okay Overweight Obese
Select TRANSFORM and then RECODE and INTO DIFFERENT VARIABLES. Select BMI from the source list into the central INPUT VARIABLE OUTPUT VARIABLE box. Enter BMIGRP into the Name box and click on Change to complete the INPUT VARIABLE OUTPUT VARIABLE box. Also enter a suitable variable label for BMIGRP in the LABEL box (e.g., categorical body mass index). To set up the recoding, click on OLD and NEW VALUES.We build up the recode specification for the third category of BMIGRP first. In the OLD VALUE box, select RANGE and THROUGH HIGHEST and enter 30.0 in the box before THROUGH HIGHEST. In the NEW VALUE section, enter 3 into the VALUE box. Then click on ADD to copy the specification 30.0 THROUGH HIGHEST = 3 to the OLD NEW box. Build up the other two specifications, in order of 25.0 through 30.0 = 2 and LOWEST THROUGH 25.0 = 1. Now run the completed command.
23
To finish, double-click on BMIGRP in the Data Editor window, and define suitable value labels (i.e., 1= okay, 2 = overweight, 3 = obese). Are the values of BMIGRP correct for the first ten cases?
Filtering Cases
In this example, we shall filter cases. The filtering option allows you to exclude certain cases from further analysis temporarily. Before filtering, generate a two-way frequency table for ownrent by typaccm by selecting ANALYZE, then DESCRPTIVE STATISTICS and then CROSSTABS and selecting ownrent for Row(s) and typaccm for column(s). Run the command and look at the table in the output. 1. What exactly does the frequency count in the first cell of the second table refer to? 6 what? We shall filter using the variable PERSNO, which is the number of persons in the household. 2. What will be the effect of selecting cases satisfying the condition persno=1? What is the impact on households? Now, select DATA and SELECT CASES and then IF CONDITION IS SATISFIED and make sure that UNSELECTED CASES are FILTERED (This is very important as the alternative is DELETED, which we want to avoid now!) Select IF.. and build up the condition persno = 1 in the large box. Run the completed command. Find persno in the data editor window. 3. What appears in the status bar when filtering is in effect? (The status bar is at the bottom of the window) 4. What has happened to case numbers with persno 1? Rerun the CROSSTABS command (via Analyse Descriptive statistics) and look at the new table in the output. 5. What exactly does the frequency count in the first cell refer to now? 3 What? Go to the Data Editor Window and save the filtered data as familyf.sav. Then select DATA, SELECT CASES and then ALL CASES. Run the command. 6. What happens to the status bar and the case numbers?
24
Deleting Cases
Instead of filtering cases we shall delete unselected cases without doing any harm to data stored in disk system files. Select DATA, SELECT CASES, IF CONDITION IS SATISFIED which picks up the previous condition on persno = 1. Then select UNSELECTED CASES are DELETED. Run the command and have a look at the Data Editor Window. 1. How many cases are left? 2. What are the values of PERSNO? 3. What are the values of HSEMO? What does that successfully show? Now, rerun the CROSSTABS command in the previous section and look at the output. 4. Do the results agree with those obtained when cases are filtered? Return to the Data Editor Window and save the selected cases to a NEW system file named familyd.sav (after deleting cases you should do this as soon as possible to avoid overwriting your complete data file by accident). Finally, re-open familyf.sav, the filtered file you saved from the previous section 5. Is filtering still on? Exit from SPSS, saving the contents of the output window into output3.spo Open up family.sav that you saved to your survey folder.
25
WEEK 3: October 17th T-Tests
Section I: Parametric T-tests (related & unrelated)

This practical will show you how to run a t-test so that you can look at the difference between means of two scores. Experimental designs can be of two basic types within subject (dependent or related) and between subject (independent or unrelated). The former is when all subjects are subjected to all conditions (e.g., testing reaction times before and after receiving a drug). Between subject designs are when you divide subjects into independent groups, such as on the basis of gender, or into one group that receives a drug, and a second that receives a placebo. DEPENDENT OR RELATED SAMPLES T-TEST First, a quick review of the test layouts. 1. Related Samples - two variables, one for each condition of the experiment. Each subject has two scores, as a result: Variable 1 (First set of scores for Variable 2 (Second set of scores the subjects, e.g. reaction time for the subjects, e.g. reaction time before taking the drug) after taking the drug) Sub. No. 1 2 3 4 5 10 11 12 10 9 30 31 32 30 29.
2. Independent or Unrelated Samples - two variables, the first tells SPSS what condition EACH subject belongs to, the second is the actual score for that subject:
Variable 1 (what condition each Variable 2 (actual score, e.g. each subject belongs to, e.g. group 1 are subjects reaction time) the controls, group 2 receive the drug) Sub. No. 1 (control) 2 (control) 3 (experimental) 4 (experimental) subjects condition (1) 1 2 2 subject 1 score subject 2 score etc. etc.
26
T-Test for Related Sample

This is the parametric comparison of two related groups, for example, when you want to compare mean scores for subjects at some task before and after taking a drug. Each set of subject scores for the related t-test must be entered as an individual variable in SPSS. So, in the above example, all the individual(s) scores for the task before taking the drug would be in one column and all the scores after taking the drug in another. First, open family.sav. The next step is to add a variable to the data file, so that we can run the related t-test. In this case, the comparison will be between the subjects height/weight ratio before they were put on a 4-week diet/exercise plan and after. The variable already in the data set HWRATIO is the measure before. At the end of the data file, add the variable HWRATIO2 to represent their measurements after the plan. Using what you learned in the first lesson about entering data, create the new variable using the information below: Variable Name: HWRATIO2 Variable Label: Height/Weight Ratio after plan Data: see table 1 below To run the procedure, go ANALYZE, COMPARE MEANS and then PAIREDSAMPLES T-TEST The usual dialogue box appears. The dialogue box has the two-column format. The only difference is that you must select pairs of variables and move them across, rather than just one variable at a time. To do this, you have to click on one variable, then locate the other variable and click on it. The two variables that you have requested should appear in the current selection box. After clicking on both, you then press the arrow button to move the pair across. SPSS will analyse each pair to determine if their means are significantly different statistically. In this case, select the variables HWRATIO and HWRATIO2 and move them across, then press the OK button. Table 1: Data for Height/Weight Ratio after a 4-week diet/exercise plan Subject Number 1 2 3 4 5 6 7 8 9 10 11 12 13 HWRATIO2 score .44 .52 .46 . .44 .42 .33 .74 .80 .32 .60 .65 .40
27
14 15 16 17 18 19 20 OUTPUT The results appear in three sections
.50 .57 .41 .60 .55 .49 .60
The first section gives you a table called Paired Samples Statistics with the mean scores, standard deviations and standard error mean for the two variables. The second section is a table called Paired Samples Correlation(s) showing the correlation between the two variables and the level of significance The third section is more important. The table called Paired Samples Test indicates the significance of the results. This includes the t-value, degrees of freedom (d.f.) and the two-tailed significance level.
What is the t-value for the comparison between the height to weight ratio scores? Is there a significant difference between the scores before and after the diet/exercise plan? If so, which is the greater height/weight ratio?
T-Test for Independent Samples

This is the parametric t-test for two independent samples - a between-subjects design where, for example, subjects are randomly assigned to two separate test conditions (e.g. drug and control) and the mean scores (e.g. reaction time) are compared to determine if they are significantly different from each other. In this case, you want to test whether there is a statistical difference in weight to height ratios between the male and female subjects. The format for variables to be used in the independent t-test is different from that used in the related. Instead of the scores being placed in two separate columns (variables), all of the scores are placed in a single column (variable). A second variable identifies for SPSS which of the two groups each score belongs to. So, in this case, there is the variable HWRATIO2 as the dependent variable and NSEX as the independent variable. To run the analysis, go to ANALYZE, COMPARE MEANS and then INDEPENDENT-SAMPLES T-TEST. As usual, the left column lists all the variables in your data file. On the right, there are two boxes: The test variable(s) box is where you move the dependent variable(s). (e.g., HWRATIO2)
28
The grouping variable box is where you move the variable that distinguishes between the two independent groups (e.g. the variable NSEX)
First, select the dependent variable HWRATION2 and move it over to the test variable(s) section. Next move NSEX over into the grouping Variable section and press the DEFINE GROUPS button. Values from the grouping variable must be entered into the two boxes. In the case of the variable sex, where only two levels are recorded, you would just enter 1" in the top box for male subjects, and 2" in the lower one for female subjects. Hit the CONTINUE button, then hit the OK button. [Note: There may be times where you have a larger range of values, such as five different education levels, but only want to look at the difference between two of them. You would enter the two values you wish to compare.] OUTPUT There are two sections: The first section of the output gives you a table called Group Statistics which indicates the number of cases and the mean scores etc. for each condition. The second section provides a table called Independent Samples T-test and starts with Levenes Test for Equality of Variance. If the variance is unequal and is indicated by significant difference, then when you look at the results of the ttest in the final table, you use the line starting with Equal variances not assumed. If it isnt significant, you look at the line starting with Equal variances assumed. The final table gives you t-values, degrees of freedom and the two-tailed significance levels.
In this case, Levenes is not significant (0.137), so we look at the equal variance line. In this case, it is not significant (two-tailed significance of .478), so we reject the hypothesis that there is a difference between males and females in their height to weight ratios.
Section II: Non-Parametric T-tests (Wilcoxon - related & MannWhitney - unrelated)

All of the tests today can be found under ANALYZE, NONPARAMETRIC TESTS
Mann-Whitney - Unrelated
This is the non-parametric t-test for two independent samples - a between-subjects design. To run the analysis, choose: ANALYZE, NONPARAMETRIC TESTS, and 2 INDEPENDENT SAMPLES
29
As usual, the left column lists all the variables in your data file. On the right, there are two boxes: the test variable(s) box is where you move the dependent variable(s) the grouping variable box is where you move the variable that distinguishes between the two independent groups (e.g. the variable sex)
So, move HWRATIO2 into the test variable box, and move NSEX into the grouping variable box. Now, click the Define Groups button. Values from the grouping variable must be entered into the two boxes. In the case of the variable NSEX, you enter 1" in the top box for male subjects, and 2" in the lower one for female subjects. Hit the Continue button, then hit the Ok button. OUTPUT SPSS divides the entire set of subjects into three groups: those with a score of 1 (male) those with a score of 2 (female) cases with missing data, which are excluded from the analysis)
The first section gives the mean ranks for the two conditions that are included, as well as the sums of the ranks and the numbers of cases The second section gives the Z score and p-values for the T-test. Is there a difference between males and females? How do the results from this week compare to last weeks?
Wilcoxon - Related
This is the non-parametric repeated measures T-test, in a within subjects design. Like the parametric equivalent, well be running a comparison of height to weight ratios for the sample population before and after a four-week exercise/diet program. To run the analysis, choose: ANALYZE, NONPARAMETRIC TESTS, and 2 RELATED SAMPLES The dialogue box has the two-column format. The only difference is that you must select pairs of variables and move them across. SPSS will analyse each pair to determine if their mean ranks are significantly different statistically. For this analysis, select the two variables HWRATIO and HWRATIO2, then click the Ok button. OUTPUT The output for this procedure is quite different from the parametric test. The first section gives you information about how many rank scores for one condition are
30
less than (LT) greater than (GT) equal to (EQ) the ranks scores for the other condition. The mean ranks for each of these three levels are given, as well as the sums of the ranks for each and the number of cases that fall under each level. The main results are underneath this table, where the Z value and the p value are given. The usual standard for levels of significance is used (if p is less than 0.05). How many cases are there where HWRATIO is greater than HWRATIO2? Is there a significant difference between ranked height/weight ratios before and after the exercise/diet program?
31
WEEK 4: October 24th ANOVAS

This practical will involve familiarising students with the analysis of variance (ANOVA). The ANOVAs used in this practical are when you may want to determine if there is a significant difference between three or more groups when you have only a single variable.
One-way ANOVA for Independent Samples

In this case, we want to determine if there is a significant difference in the height to weight ratio between the three age groups in the sample in family.sav - children, adults and elderly. We also want to carry out a Tukeys post-hoc test to identify where those difference lie, if any. The procedure is remarkably similar to carrying out an unrelated samples t-test. Go: ANALYZE, COMPARE MEANS, ONE-WAY ANOVA As you can see, the layout of the dialogue box is basically the same as the one for unrelated t-tests from last week. First select your Dependent variable(s) - in this case move the variable HWRATIO into the dependent list section. Your factor (independent variable) is the variable AGEGRP. Press the Continue button. Before running the analysis, press the Post-hoc button and turn on the Tukeys test. Now press the Continue and Ok buttons and the analysis will be carried out. OUTPUT There are two sections to the results for the one-way ANOVA. 1. The first section indicates whether any significant differences exist between the different levels of the independent variable. The between groups, within groups, sums of squares are listed, degrees of freedom, the F-ratio and the F-probability score (significance level). It is this last part that indicates significance. If the Fprob. is less than 0.05 than a significant difference exist. In this case, the F-prob. is 0.000, so we can say that there is a statistically significant difference in height to weight ratios between the three age groups. 2. The post-hoc test identifies where exactly those difference lie. The final part of the second section is a small table with the levels of the independent variable listed down the side. Looking at the comparisons between these levels we see that children have a significantly higher mean height to weight ratio than adults and the elderly (this is also indicated by the asterixes). For the meantime, ignore the third table of the output.
32
One-way ANOVA for Related Samples

The procedure for running this is very different from anything youve done before. The first step is easy enough - you need to add a third height to weight ratio variable, representing the ratios for the subjects some time after they stopped doing the exercise/diet plan. The data is below: Variable Name: HWRATIO3 Variable Label: Height/Weight Ratio post-plan Data: see table below
Subject Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 HWRATIO3 score .42 .56 .42 . .41 .40 .30 .78 .71 .30 .55 .64 .40 .49 .55 .39 .52 .54 .49 .60
The first step is to run a single factor ANOVA by going: ANALYZE, GENERAL LINEAR MODEL, REPEATED MEASURES The dialogue box is different from the usual format. The first step is to give a name to the factor being analysed, basically the thing the three variables have in common. All three variables cover height to weight ratios, so in the With-in Subject Factor Name: box type RATIO. in the Number of Levels box, type 3 (representing the three variables) press the Add button, then the Define button
The next dialogue box is a bit more familiar. In the right-hand column, there are three question marks with a number beside each. Select each of the three variables to be included in the analysis, and move them across with the arrow button. Notice how each of the variables replaces one of the question marks, indicating to SPSS which
33
three variables represent the three levels of the factor RATIO. Then proceed by clicking on OK. OUTPUT Firstly, you can ignore the sections of the output titled Multivariate Tests and Mauchlys Test of Sphericity. You need to examine the section titled Tests of Within-Subjects Effects. This section indicates whether any significant differences exist between the different levels of the within subjects variable. The degrees of freedom and sums of squares are listed, as well as the F-score and its significance level. If the significance level is less than 0.05 than a significant difference exist. In this case, it is 0.001 (look at the measure for sphericity assumed), so we can say that there is a statistically significant difference in height to weight ratios between the three times when measurement were taken. You can ignore the section titled Tests of Between-Subjects Effects. It is irrelevant here. To do a post-hoc test to identify where the differences lie, the SPSS for Windows made easy manual recommends doing Paired-Sample T-tests. In this case HWRATIO & HWRATIO2 HWRATIO & HWRATIO3 HWRATIO2 & HWRATIO3 From these three T-tests, you can determine which of the height to weight ratios are significantly different from each other.
Kruskall-Wallis ANOVA (KWANOVA Unrelated)

This is similar to the non parametric independent ANOVA, where ranks are used instead of the actual scores. We will run the analysis on the same variables, so go ANALYZE, NONPARAMETRIC TESTS, and K INDEPENDENT SAMPLES As with the parametric test, move HWRATIO over to the test (dependent variable list and AGEGRP over to the Grouping (independent) variable list, and define the group with a minimum of 1 and a maximum of 3. Click the Ok button. Notice that the non parametric ANOVA doesnt have a post-hoc test. If you run this ANOVA, youll have to consult a statistics book as to how to do a post hoc on the results. One way would be to run a series of t-tests on all the combinations of the conditions. OUTPUT The first section gives you the mean ranks and the number of cases for each level of the independent variable. The second section lists the Chi-Square value, degrees of freedom and significance of the test.
34
Is there a significant difference between the three groups (remember you cant say exactly what that difference is without a post hoc test)?
Friedmans - Related ANOVAs

This is similar to the nonparametric related samples ANOVA, where ranks are used instead of the actual scores. We will run the analysis on the same variables, so go ANALYZE, NONPARAMETRIC TESTS, and K RELATED SAMPLES This is much easier to run - just move the three variables (HWRATIO, HWRATIO2 and HWRATIO3) over to the right column and click OK. OUTPUT There is the Chi-square score, the d.f. and whether its significant (as usual, has to be less than 0.05). Again, for post-hoc tests, youll probably have to consult a statistics book or possibly run three non-parametric related samples T-tests.
35
WEEK 5: 30th October Study Week
WEEK 6: November 6th No Practical
36
WEEK 7: November 14th QUALITATIVE RESEARCH: STUDENT SEMINAR PRESENTATION PREPARATION

Students should use this time to prepare work for their presentations. Dr. Alison will be available in his office for guidance if necessary.
WEEK 8: November 21st QUALITATIVE RESEARCH: STUDENT SEMINAR WEEK 9: November 28th INTERVIEWING AND DISCOURSE ANALYSIS conductig interviews etc
This period should be used to conduct interviews in preparation for the session on content analysis. Students are expected to conduct interviews or sessions that result in naturally occurring language. It is important that this material is transcribed in preparation for week 11s session. Dr. Alison will be available for consultation.
WEEK 10: December 5th WORKING WITH NATURALLY OCCURING LANGUAGE PREPARATION
Students will use this period to work with their material gathered in the previous sessions. They should use this time to prepare for presentations in the final practical session (12th December).
WEEK 11: December 12th WORKING WITH NATURALLY OCCURING LANGUAGE: STUDENT SEMINAR
Students are expected to organise their own seminar presentations in this session on the results and methods employed regarding the content analysis of their material.
37
SECTION III EXTRA MATERIAL
38
For the benefit of students who wish to follow up other procedures in their own time, we have included the following section which gives you some opportunity to play with graphics packages and explore some issues associated with regression in preparation for next term. Try not to worry if this all sounds unfamiliar at first. This section is simply to give you a running start when it comes to your work after Christmas.
REGRESSION
Simple Regression In simple regression, the values of one variable (the dependent variable (y in this case)) are estimated from those of another (the independent variable (x in this case)) by a linear (straight line) equation of the general form: y=bo+b1(x) where y is the estimated value of y, b 1 is the slope (known as the regression coefficient) and bo is the intercept (known as the regression constant).
Multiple Regression In multiple regression the values of one variable (the dependent variable (y)) are estimated form those of two or more variables (the independent variables (x 1, x2, ,xn)). This is achieved by the construction of a linear equation of the general form: y=bo+b1(x1)+b2(x2)++bn(xn) where the parameters b1,b2,,bn are the partial regression coefficients and the intercept bo is the regression constant. Residuals When a regression equation is used to estimate the values of a variable (y) from those of one or more independent variables (x), the estimates (y) will not be totally accurate (i.e., the data points will not fall precisely on the straight line). The discrepancies between y (the actual values) and y (the estimated values) are known as residuals and are used as a measure of accuracy of the estimates and of the extent to which the regression model gives a good account of the data in question. The multiple correlation coefficient
39
One measure of the efficacy of regression for the prediction of y is the Pearson correlation between the true values of the target variable y and the estimates y obtained by substituting the corresponding values of x into the regression equation. The correlation between y and y is known as the multiple correlation coefficient (R (versus r which is Pearsons (the correlation between the target variable and any one independent variable)). In simple regression R takes the absolute value of r between the target variable and the independent variable (so if r=-0.87 than R=0.87). Running Simple Regression Using the family.sav file we want to look at how accurately we can estimate height to weight ratios (HWRATIO) using the subjects age (AGE). To run a simple regression, choose ANALYSE, REGRESSION and LINEAR. As usual, the left column lists all the variables in your data file. There are two sections for variables on the right. The Dependent box is where you move the dependent variable. Move HWRATIO there. The Independent(s) box is where you move AGE. Next click the STATISTICS button, and turn on the Descriptive option. As already states, a residual is the difference between the actual value of the dependent variable and its predicted value using the regression equation. Analysis of the residuals gives a measure of how good the prediction is and whether there are any cases that should be considered outliers and therefore dropped from the analysis. Click on Case-wise diagnostics to obtain a listing of any exceptionally large residuals. Now click on CONTINUE. Now click on the PLOTS button. Since systematic patterns between the predicted values and the residuals can indicate possible violations of the assumption of linearity you should plot the standardised residuals against the standardised predicted values. To do this transfer *ZRESID into the Y: box and *ZPRED into the X: box and then Click CONTINUE. Now click Ok. Output The first thing to consider is whether your data contains any outliers. There are no outliers in this data. If there were this would be indicated in a table labelled Casewise Diagnostics and the cases that corresponded to these outliers would have to be removed from your data file using the filter option you learned previously. With that out of the way, the first table (Descriptive Statistics) to look at is right at the top. The first part gives the means and standard deviations for the two variables (e.g. the mean age is 31.77). The next table contains the correlation (Pearsons) for the two
40
variables, just as if you had run the correlation procedure. The coefficient is -0.57, so it is fairly high and is negative (as one goes up, the other decreases). For the meantime, ignore the table labelled variables entered/removed. The next important table is Model Summary. The R and R-squared values are given for the equation (0.571, as above, and 0.325). Dont worry too much about the other values in this table. The next table contains the regression ANOVA. This test indicates how good the model is - whether there is some overall relationship between the dependent and independent variable(s). The key element is the F score. For this regression, the F score has an associated p value of 0.017, well below the .05 cut-off. This indicates that there is a linear relationship. It should be noted however that only an examination of the scatter plot of the variables can confirm that the relationship between two variables is linear. The next table contains some really important information! The table is labelled Coefficients and contains the regression equation. The regression coefficient and constant are given in column B of the table. The equation therefore is: Predicted height to weight ratio = -.00368(AGE) + .602 The t value indicates whether each independent variable has a significant individual impact on the regression equation. In simple regression, there is only one independent variable, and, for this one, it has a significant influence (a t score with an associated p value of 0.0168 - notice its the same as the ANOVA score). The next section begins with Residual Statistics. This gives means, SDs and other information about the unstandardised and standardised predictor and residual scores in the regression. You could follow up the regression by doing up a scatter plot. Look at your scatter plot. Basically, all you need to know is that if the plot shows no obvious pattern than this confirms that the assumptions of linearity and homogeneity of variance have been met. Where you get into trouble is if the points form a crescent or funnel shape. If this is the case, further screening of your data is necessary. Multiple Regression Often, it is too simplistic to assume that a single independent variable is all that is required to make some sort of prediction about the scores for a dependent variable. This is where you have to run multiple regression. For now, the regression will look at the impact of age (AGE), height to weight ratio post-plan (HWRATIO2) and height to weight ratio long after the plan (HWRATIO3) on the dependent variable, the subjects initial height to weight ratio (HWRATIO). To run the analysis, choose: ANALYSE, REGRESSION and then LINEAR.
41
As before, move HWRATIO to the Dependent. The Independent(s) box is where you move AGE, HWRATIO2 and HWRATIO3. The rest is as before: Click the STATISTICS button, and turn on the Descriptive option. Click on Case-wise diagnostics to obtain a listing of any exceptionally large residuals. Now click on CONTINUE. Now click on the PLOTS button. Since systematic patterns between the predicted values and the residuals can indicate possible violation of the assumption of linearity you should plot the standardised residuals against the standardised predicted values. To do this transfer *ZRESID into the Y: box and *ZPRED into the X: box and then click CONTINUE. Now click OK. Note: we are only doing a general, all-inclusive multiple regression. There is a box located directly beneath the Independent(s) box called Method which gives you a series of additional methods for running the statistics - stepwise, remove, forward and backward. Output Again, the first thing to look for is outliers. Again, there are none. With that out of the way, the next section to look at is at the top. Everything that follows is the same as for the simple regression. The first part gives the means and standard deviations for the four variables (e.g. the mean HWRATIO3 is .526). The next part gives the correlation (Pearsons) for all of the variables. You can see that HWRATIO is strongly correlated with the two other height-to-weight ratio variables (i.e., both over .9). The next section is under the heading Model Summary. The R and R-squared values are given for the equation (.98 and .967). An ANOVA is carried out that indicates how good the model is - whether there is some overall relationship between the dependent and all of the independent variables. The key element is the F score. The F score is significant (p=0.00), so there is a strong overall relationship. The next table (Coefficients) contains information that indicates the individual role of each independent variable. The values in the column labelled B give the scores to put into the regression equation: y = b1(x1) + b2(x2) + b3(x3) + bo
42
For this regression, then, the regression equation is
HWRATIO = -.0009(AGE) + .99(HWRATIO2) -.135(HWRATIO3) + 0.063 Note that since the B score for HWRATIO3 is negative, the plus sign turns to a minus one. The t-test indicates that AGE, as before, is a significant predictor, as is HWRATIO2, but that HWRATIO3 as a single predictor has no significant influence (p>0.05). The next section is labelled Residual Statistics. This gives means, SDs and other information about the unstandardised and standardised predictor and residual scores in the regression. You should have been taught what, if anything, to do with them. Scatter plots and Regression Lines A regression line can easily be added to a scatter plot. scatterplot go to GRAPH and SCATTER. As before, to create a
You want to leave the graph layout as simple, so just click the DEFINE button. Move HEIGHT into x-axis box. Move WEIGHT into Y-axis. Now, click the TITLE button. You can now put in a title in the Line 1 box. You can add an additional title and sub-title lines if you want. Now press the CONTINUE button and then click the OK button. The graph should now appear. The window where all the graphs are stored is called the Chart Carousel, and can be saved as a separate file. The extension for chart files is always .cht What is the line of best fit and what does the value of R2 tell you?
43
Chi-Square
There are two ways to run Chi-square. The first is when looking at differences in frequencies across levels in one variable. In this case, we want to see if there are differences in the frequencies for the three levels of the variable AGEGRP (age groups) - child, adult and elderly. You do this through: Analyze Nonparametric Tests Chi-Square To run a basic Chi-Square, just move the variable(s) to analyse across and click Ok. In this case, move the variable AGEGRP over and run the analysis. [NOTE: If youre interested in the various options, information about them can be found by pressing the Help button when you are in a dialogue box] OUTPUT The results present the observed and expected frequencies for each of the three levels, as well as the Chi-Square value, the degrees of freedom (d.f.) and the significance level. Is there a difference between the three groups in terms of their observed frequencies? The second way to run a Chi-Square is when carrying out a crosstab. The only change is that before running the crosstab, you have to turn the Chi-Square option on. So, go Analyze Descriptive Statistics Crosstabs Move the variables NSEX in the column box and NCARS in the row box. Make sure to turn on the Chi-Square option, by clicking the Statistics button, and turning on the Chi-Square option. Press the Continue and Ok buttons, then run the analysis. OUTPUT The crosstabs box is displayed, along with a variety of results. The one to be concerned with is the significance level for the Pearsons value.
44
Microsoft Word Exercises

This exercise shows you how to copy and format a document. To save time heres one we prepared earlier. A cast list is given below. Your task is to format the document (top of page 80) into an organised piece of work (bottom of page 80). As you do this, note the different techniques you use - they will come in handy as the course progresses. These are hands on sessions, meaning that you should be discovering what to do yourself. Of course, if you have any difficulties then we are here to assist you. Good luck, and remember the Help facility. The Help Facility Normally you will want to go to the Help menu, then choose Contents and Index. Click on Index and type in a relevant key word. The Opening Screen Word offers a number of ways of viewing the document. The most usual is Normal. So, go to the View menu, and select Normal. Alternatively, use the shortcut button at the bottom left of the screen. If you are not sure what a particular button does then you should hold the pointer arrow over the button for a second or two without pressing anything. Word will then give a short description of the button. The other view often used is Page Layout, which shows how the page will be printed. Using Zoom from the View menu will allow you to enlarge the screen. Opening Files Were going to be e-mailing you two documents entitled play.doc and actone.txt. Open up the e-mail, and then, one at a time, click the Word icons once with your right mouse button. Now save the documents by clicking on Save. Find the Msoffice icon and click on it. Now save your documents in the Msoffice folder under suitable names (e.g. play.doc and actone.txt). Now go into word. To open the file you just saved go to File, Open, click on your M: drive and find your Msoffice folder. Click on the Msoffice folder and find play.doc. Double click on play.doc. Hidden Codes Certain characters or text in Word are hidden. That is to say, they will only appear on the screen but not in the final printed version. To turn this option on and off, click on the reversed P button on the toolbar. That marker is the paragraph marker, denoting a new line (hard return). Turn the hidden codes off.
45
Now show the hidden codes. Can you spot the deliberate mistake? Yes, its one of those errors in conversion. Double click on {PRIVATE} and the whole word is selected. (This is a handy trick worth noting). Delete this word. Correct any other deliberate mistakes. Page Layout The original document has margins of 2 inches. Make sure the measurement units are in inches by going to Tools, Options, General, and clicking inches in the box called measurement units if it isnt already done so. Now go to File, Page Setup and increase left and right margins to 2 inches from the Margins option. If it asks you if you want your margins fixed respond with yes. Also note that under Paper Size you can change the orientation of the paper. Briefly, portrait is upright (for text mainly) and landscape is horizontal (for graphs and pictures). To change the justification, select everything by going to Edit, Select All or by dragging the mouse over the whole document (only if its a small document). Now, click the right mouse button over any part of the selected area. Choose Paragraph from the menu that appears and choose Justified from the Alignment option. You can also do this from the toolbar. Centre alignment is useful for headings. Change The Play to centre alignment. Formatting Italicise What The Butler Saw by clicking on What and dragging over the other three words. Now use the toolbar to italicise by hitting the I button. Highlight all of the text and change the font size to 12 pts. Type in the other characters, leaving a space between the character and actor names. Similarly, change the characters names to small caps by selecting the name and using the right mouse button. From Font choose the Small caps option. For the other character names, simply select the name and go to Edit, Repeat. Select the cast and add a tab into the ruler at two inches by double clicking on the ruler at the two inch mark. Place your cursor before Stanley. Go to Format and Tabs and add the leader option 2 (i.e. lots of full stops). Press Ok and then press the tab key. Do this for each cast member. For the director and designer, the tab is set at 1.5 inches with no leader. Separate the pieces of text with two hard returns and dont forget to save your work.
The Play
46
The first London performance of What The Butler Saw was given at the Queen=s Theatre by Lewnstein-Delfont Ltd and H.M. Tennnant on 5th March, 1969, with the following cast in order of appearance. Dr Prentice Stanley Baxter Geraldine Barclay Julia Foster Mrs Prentice Coral Browne Directed by Robert Chetwyn Designed by Hutchinson Scott
The final version should look something like this:
The Play
The first London performance of What The Butler Saw was given at the Queens Theatre by Lewnstein-Delfont Ltd and H.M. Tennent Ltd. on 5th March, 1969, with the following cast in order of appearance: DR PRENTICE .............Stanley Baxter GERALDINE BARCLAY ...Julia Foster MRS PRENTICE ...........Coral Browne NICHOLAS BECKETT .....Hayward Morse DR RANGE ................Ralph Richardson SERGEANT MATCH ......Peter Bayliss Directed by Designed by Robert Chetwyn Hutchinson Scott
47
Styles, Headers/Footers and Page Numbering

SET AUTOSAVE ON OR LOSE YOUR DOCUMENT! Do this by going to Tools, Options, Save. Set it to save every 5 minutes or so.
Styles in Word can save you a great deal of time that you might spend formatting. Hence it is highly advisable to use them. Basically, a style is a shortcut way to format large chunks of text. All you have to do is to type in the text and then apply the style of your choice. Page Control and Inserting Files To demonstrate this feature, open your copy of the cast list. Put the insertion point (the flashing vertical line) at the end of the cast list. Press return. From Insert, choose Break and then Continuous Section Break. Now Insert a Page Break. This splits the text onto different pages and allows different page layouts to be used. (It might be an idea to turn hidden codes on so you can see the break). If you had just inserted a Page Break, then there would be a new page but the same formatting. Important difference and worth noting. Put the insertion point after the break. Now you need to discover where the files in msoffice are kept, since you will be inserting one. To do this, choose Insert and File. Scan around m: drive, and ask for text file types (i.e. *.txt). Now open the file actone.txt. (This was saved as a generic text file.) The reason we are inserting rather than opening the file is so that the text is continuous and not as two separate documents. Now save the new big document as a Word 6 document by going to File, then Save As ... Make sure under File Type that Word Document is selected. Name it something new. If you can spot any deliberate mistakes delete them. Change the Margins to all 2 for the second section, similar to how you did last week (File, Page Setup, Margins). See how the different sections have different formatting? Creating Style(s) Select the paragraph starting Dr Prentice enters briskly. (The quick way to select a paragraph is to double click to the left of the paragraph. Or, treble click anywhere in the paragraph). Now go to Format, Style. Click on New and name the new style Directions. Format the new style (by choosing the format option in this box) as having a font of italics and size 12, a paragraph indentation of 0.5 from the left and
48
the right, plus spacing before and after the paragraph of 6 pt. Click OK, and Apply to see how the new style is applied to the text. Now select the spoken parts, such as the bit starting Prentice (turning at the desk). Name this style Spoken, with a paragraph format of a Hanging Indent (from Special) at 0.25. Dont forget to use the Help facility if you dont understand anything, such as the last sentence. Now apply the styles to the rest of the appropriate text by selecting the text and then going to Format, Style and Apply for the relevant style. The quick way is to click on the drop down menu above the ruler to the left which probably says Normal, Directions or Spoken. If it asks you whether you want to redefine the style, then the answer is no. When you type your essays, it may be an idea to define styles for Headings (centre aligned, big font, underlined), Quotes (indented, smaller font size, paragraph spacing before and after), Text (first line indented, double spaced), etc. Headers, Footers and Page Numbers Making sure you are in the second section of the text, go to View, Header/Footer. The header/footer toolbar appears. Discover what each button does by holding the pointer over the button with the mouse. You may need to enlarge the view from View, Zoom ... In the Header, type What The Butler Saw using centre alignment. Close the view down. From Insert, Page Numbers at the centre alignment at the bottom of the page Footer. Dont show the page number on the first page. Auto Correct Another useful feature in Word is the Auto-Correct, which changes spelling mistakes as you type them. With Auto-Correct, you can type shortcut words and then Word will type them in full. For example, every time I type behr Word changes it to behaviour hence saving me valuable time. To make this possible, go to Tools, Auto-Correct. In the Replace box, put an abbreviation for a word that you use often and in the With box put the full word. Do this for as many words as you like. For this exercise, since things such as DR PRENTICE are typed quite often, and changing the font to small caps is a pain, first of all find DR PRENTICE from the previous week. Make sure the space after the e OF prentice is in normal font. Select the two words and go to Tools, Auto-Correct. In the With box is Dr Prentice already. Click the formatted text option to keep the formatting change, and in the Replace box type something like DP. Click OK and now every time you type DP it is replaced with DR PRENTICE.
49
Act One A room in a private clinic. Morning. Doors lead to the wards, the dispensary and the hall. French windows open on to pleasant gardens and shrubberies. Sink. Desk. Consulting couch with curtains. Dr Prentice enters briskly. Geraldine Barclay follows him. She carries a small cardboard box. Prentice (turning at the desk). Take a seat. Is this your first hob? Geraldine (sitting). Yes, doctor. Dr Prentice puts on a pair of spectacles, stares at her. He opens a drawer in the desk, takes out a notebook. Prentice (picking up a pencil and notebook). I'm going to ask you a few questions. (Hands her the pencil and notebook). Who was your father? Geraldine puts the cardboard box she is carrying to one side, crossed her legs, rests the notebook upon her knee and makes a note. Geraldine. I've no idea who my father was. Dr Prentice is perturbed by her reply, though gives no indication. Prentice. I'd better be frank, Miss Barclay. I can't employ you if you're in any way miraculous.
Inserting Tables
To cut and paste material between applications is normally quite easy. We are going to insert the results of an Excel spreadsheet into a Word document, and then format the data in a table. Multi-tasking This means doing more than one thing at once, such as drinking and smoking. In computing terms it means basically having more than one application open at the same time. Open Microsoft Word. Push the Alt and Tab keys together. See how you cycle through the open applications? Let the keys go and you will switch to that application. Now open Microsoft Excel at the same time and open the spreadsheet that was sent to you over email. Select the results of the analysis (the columns from O to (OE)^2/E) and copy them. Ensure that the results are to two decimal places. Switch to Word and paste the results. See how easy it is to copy between applications? Note how the default formatting is a table. Also, note how the values of the cells are pasted, not the formulae. Turning the Tables Make sure the hidden codes are on, and if you cant see the table gridlines go to Table, Show Gridlines.
50
Go to Edit, Replace and change O into Observed and then E into Expected. Make sure you click on More and check that the Find Whole Words Only box is checked. Note that if you had not selected the table, Word would have changed everything in the document. As you can see, the columns are no longer wide enough. Select the table and go to Table, Cell Height and Width and change it to something suitable. Try at least one inch. Now try formatting the table. Just as with Excel, you must select the line you want and then add the bordering. The slow way to do that is to go to Format, Borders and Shading each time. The fast way to do it is to use the borders icon in toolbar at the top of your screen. Now select the top row. Point to the left of the word OBSERVED, so that the insertion point turns into an arrow. Click, and the row is selected. Put a point line on the bottom of the row, and change the line width option to a 2 point line for the top of the row. (By the way, to put in special characters like or go to Insert, Symbol and then choose what you want. While were on the subject, to put an equation into Word, go to, Insert, Object Microsoft Equation 3.0. Play with it, if you want). Returning to the table. On the bottom row, put in a 2 point line. To insert a column select the OBSERVED column by pointing above the word OBSERVED until it changes to a black arrow, then click. Now using the right mouse button, insert a column. Enter in the information, as given on the spreadsheet. It is also possible to vertically centre text, such as the table, by going to File, Page Setup, Layout and changing vertical alignment to centre. You might need to go to File, Print Preview or Page Layout view in order to see the full effects. If the table sticks out over the edge of the page. Select the table and autofit it from Table, Cell Height and Width, Column. As you can see, the titles are now too big. You can get round that by shrinking the font size or abbreviating the titles. Improvise. Graphics and Arrows Well, actually just arrows. If you need to do a few graphic bits in Word, such as arrows and boxes, your best bet is do use the Drawing facility. Turn the Drawing toolbar on in a similar way to turning on the Borders toolbar. Type the return key a few times under the table. This now allows us to enter freeform text using the Drawing facility. Click on text box, towards the left of the toolbar. Now drag an area under the table and type the text into the box. The size and font of the text in the box is formatted in the normal way. Make it 14 point. Double click on the edge of the text box and increase the box to one point and the edge to dashed. Choose fill colour and choose light gray.
51
Now draw an arrow pointing to the last column. Double click on the arrow to change the options such as its width, size of arrowhead etc. Make the arrowhead wider and longer, and make the line 2 points. SOURCE OBSERVED EXPECTED OBSERVED- (OBSERVED- (OBSERVEDEXPECTED EXPECTED)^ EXPECTED)^ 2 2/EXPECTED OPY 13 8.80 4.20 17.60 2.00 OPN 14 18.20 -4.20 17.60 0.97 OFY 2 6.20 -4.20 17.60 2.84 OFN 17 12.80 4.20 17.60 1.37
Chi-Square Table
Creating Tables in Word

To create Tables in Word, go to the point in the document where you want the table inserted and go Table Insert Table You will then be asked how many columns and rows you want. Select the number of each that you want then press the AutoFormat button. This is basically a step by step guide to creating the layout of a table. Selection the options you want and then click the Continue button for each dialogue box, until you come to the end. You should then have a table you can add rows/columns to and do all the editing described in the lesson above.
Creating Templates
So far we have seen how you can format styles to make typing large bodies of text both pretty and functional. But as you have no doubt noticed by now, when you leave a document you also leave the formatting with it. How can you change that? Templates When you open a new document, Word defaults to the normal template. The normal template basically is a document with no formatting and with everything set to default. To make things a little more exciting we have to define a template. Go to File, New and choose template. Go to File, Save As and see how it selects save as document template? Cancel that last command. Now, set up some styles that will be useful for when you eventually come to type up your next few pieces of work.
52
Think back to the session where you created those styles. In this template, do the same. You dont actually need any text, though it may help if you type in a few lines to show what the different styles actually do. The whole point of doing this is that when you come to do future work, anything saved in the template will be present. So, if you want a standardised title page then create it now, obviously leaving gaps for titles and dates, etc. When you are happy with your template, go to File, Save As. Call it something like Essays or Coursework. All the styles and any standard text will now be saved as the template. Close the template down, and close Word. Now, when you re-enter Word it still gives you the default Normal template. Go to File, New and choose the template name and OK. As if by magic, the formatting and styles appear. When you open a new document, several other templates are offered to you. Try opening Elegant Letter.dot. Note the blanks you have to fill in to create quick letter? Other Bits Made a mistake? In Word we undo mistakes by going to Edit, Undo. You can also redo things by going to Edit, Redo. If you had a long list of variables on consecutive lines and wanted to sort them alphabetically (make sure there is a hard return after each variable, using Edit, Replace if necessary by replacing a space with a hard return from Special Characters), you can select what you want sorted and go to Table, Sort. You can also splice the document into two windows, for those fancy multi-task operations, by using Window, Split and then clicking where you want to have the split. Get rid of it by moving the split into the toolbar. Now, from Tools, choose Spelling and run a spell check. You can also run a Grammar check. Open a document. With that document open, do a word count, from Tools, Word Count.
53
MS Excel Exercises
A spreadsheet is a cross between a calculator and a word processor, with a bit of statistical package thrown in. Its main use is in automating repetitive number crunching tasks or for formatting and cleaning data files. A spreadsheet consists of a matrix of rows and columns, each identified by a cell name (its address). As you move around the sheet, you are selecting different cells, and you can type in data and formulae.
Entering Data and Using Formulae

Your mission today is to copy in the spreadsheet (exactly including bold print, larger font, border lines, etc.) and learn how to use formulae and names to demonstrate different coefficients. Entering Text Most text is entered just as in Word. Note that if a line is too long for a cell then it will run over the other cells. This usually causes no problems, though it may seem counter-intuitive. Copy the top half of the spreadsheet, and dont forget to use formatting. Change the print layout to landscape from File, Page Setup, Landscape. This is very similar to Word. Watch out for the borders - look on the toolbar towards the right hand side. Don't forget to use Edit, Undo if it goes pear-shaped. Don't bother entering the numbers in the second part of the sheet (where it starts with Phi) - these will change later. Do note that the formatting for the coefficients is different - select these four cells and go to Format, Cells, Number and then switch the decimal places to 2. Most of the sheet is text, but some cells are named or are formulae. This is where the fun begins. Names and Formulae Basically, a named cell is one which has a pseudonym. Hence, when referring to a name cell, rather than type in cell addresses, which are not meaningful, we can use names, which are. In the Data Entry Area, name cell D23 as PP by going to Insert, Name, Define and calling it PP, meaning variable 1 Present and variable 2 Present. Name E23 as PA etc. See how just under Arial the cell address changes to the name.
54
Formulae are used in cells to perform calculations. If you type in the formula then the coefficients will change automatically as you change the data entry area. Hence you can see for example how joint non-occurrence is not used by Jaccards, even the figure is huge compared to the rest. A formula always starts with an = sign. So, in cell G24 (named Sum) the line should be =PP+AP+PA+AA. If you enter data into the named cells in the data entry area, you will see that the Sum automatically changes. The Formulae If we werent using names, the formula for Jaccards would be =D23/ (D23+D24+E23). Note the use of brackets - this is highly important but not very useful. Using the names created earlier, you can now type into cell C33 the formula =PP/(PP+AP+PA). This is more meaningful, and emphasises the simplicity of Jaccards as well as the fact that it ignores joint non-occurrence (i.e. AA). Provided there are values in the data entry area, Jaccards coefficient will be calculated instantly. Try entering the formulae for the other coefficients. These are:
Phi = AA * PP AP * PA ( AA + AP ) * ( AA + PA) * ( PP + AP ) * ( PP + PA)
(Looks vaguely familiar? Its Pearsons dichotomised. Squareroot is achieved by raising to the power of 0.5 i.e. ^0.5) Dont forget - careful with those brackets. Testing the Formulae Enter different values into the data entry area and see how the different coefficients give wildly different results. Moral of the story: make sure the coefficient you use is appropriate or risk completely invalid results. Other related questions: What are the effects of the value (e.g. 0, 10 and 1000) of the AA cell on different coefficients?
Doing Calculations in Excel

Now we take this a stage further and show how formulae are used in more advanced calculations. Since much of this has been done before, the instructions in this exercise will be more brief.
55
Were going to investigate the past record of MSc candidates on the exam using the variables result (pass/fail) and bribed examiner (yes/no). To do this we will use chi squares. Contingency Tables Type in the contingency table, then select it all and copy it further down the page to make the observed table. Careful of the formatting. Numbers should be changed by Format, Cells, Number, and setting decimal places to 2. Borders are Format, Cells, Border (or you can use the borders icon from the tool bar). Note that in the observed table, the values are called various names. Remember, thats Insert, Name, Define. Use formulae to calculate the totals columns, for example the Yes column is =OPY+OFY, with O standing for observed, P for pass. Got the picture? As you are all aware, the chi squared test examines the observed (i.e. empirically obtained) results with the statistically expected set of results. Clearly if there is a significant difference between observed and expected then there is hypothesised to be some mechanism causing this difference and hence we can conclude that the observed results are not due to chance factors. Chi squared is given by:
2 =
(Observed Expected ) 2 Expected
To calculate the statistically expected results, we use probability theory. If you are comfortable with this, then carry on. If not, then check with the textbooks. Note the formulae used to obtain the expected results. For example, expected for pass and yes is pass total multiplied by yes total all divided by the number of observations i.e. =(Yes*Pass)/Sum. Do this for all expected results. Under the values part, put =OPY under the O column (where the number 13 is), and =EPY under the E column (where the number 8.80 is). Now type the rest of the numbers into the O and E columns. Under the O-E column, first type '=', then click two cells to the left with the value of OPY in it. Now type '-', and click one cell to the left with the value of EPY in it. Press return. See how it automatically enters the cell address? Select the cell you just typed the formula into and drag down onto the three cells below it. Go to Edit, Fill, Down and the formula will be copied down. Note how the formula changes automatically, relative to where the values are? Do the same for squaring the result and dividing it into E.
56
Manual Chi
Contingency Table: Bribe d Yes Result Pass Fail Total OPY OFY Yes No OPN OFN No Total Pass Fail Sum
Observed: Bribe d Yes Result Pass Fail Total 13 2 15
No 14 17 31
Total 27 19 46
Expected: Bribe d Yes Result Pass Fail 8.8 6.2
No 18.2 12.8
Values:
O 13 14 2 17
O-E (O-E)^2 (O-E)^2/E 8.80 4.20 17.60 2.00 18.20 -4.20 17.64 0.97 6.20 -4.20 17.64 2.85 12.80 4.20 17.64 1.38 Chisq.= 7.19
57
For our last trick - to sum, automatically click on the cell where you want the chi square value to go and then on the summation symbol () on the toolbar. Drag over the last column and press return to obtain the sum. The chi square value can be checked with statistics table to judge its significance. Degrees of freedom are given by (r-1) (c-1) i.e. the numbers of rows in the contingency table minus one multiplied by the number of columns in the table minus one. Some Questions For You To Consider What happens if the sample size drops below 45? What if it drops below 5 in any cell?
58
APPENDIX BASIC STATISTICS
59
Basic Statistics
A statistic is a structured piece of data, carrying meaningful information. Research begins when we start to investigate these statistics systematically, i.e. to analyse them. Broadly, there are two sorts of statistical analysis: Descriptive statistics: this involves analysing the parameters of the sample and assessing the characteristics of the population. Inferential statistics: most often this involves hypothesis testing using a sample to test differences in a population. To begin with, we will concern ourselves with descriptive statistics.
Data: Its Collection and Its Disposal

So, how do we obtain data? To convert information into data requires some sort of coding framework consisting of variables, or attributes, relating to the domain of concern. People (cases) are sampled representatively so that clues as to the nature of the population can be calculated. 1: Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 Issues in Coding Data Issues involved in the choice of content categories: exhaustivity - ensure all cases are covered by the options or levels in the category exclusivity - each case must have only one possible option or level in the category relevance - the item must pertain to the domain of concern adequate coverage of domain - the domain must be adequately covered specificity - the definition of each category must be precise to allow for consistent coding (this also relates to inter-rater reliability - more later). 2: Attributes (Variables) 3: 4: 5:
This data can be displayed in a number of ways: as percentages, ratios, in a bar chart, cumulative frequency chart, pie chart, etc. Each of these describes the characteristics of the sample, hence the term descriptive statistics. Often these data sources are noisy and hence we must take care in the construction of coding frameworks and we must determine which forms of analysis are feasible. We
60
must be extremely aware of issues such as the level of data used and characteristics of the population, namely distributional assumptions.
Levels of Data
Data conforms to one of four levels: Nominal (categorical) - the value is either present or not Ordinal - the value is ranked relative to others Interval - the value is scored absolute to others Ratio - the value is scored absolute to others and to a meaningful zero
An example: Consider three horses in a race. Coding the race times under a nominal level will tell us if any particular horse won the race or not (e.g. Guttmans Folly did not win). Coding under an ordinal level, we can tell where a given horse came in relative to the others (e.g. Guttmans Folly came in second). Coding under an interval level, we know where a given horse came absolute to the others (e.g. Guttmans Folly was 1.5 seconds faster than Galloping Galton, but 2.3 seconds slower than Cattells Chance). Coding under a ratio level we would know where a given horse came absolute to the others and a meaningful common zero point to all of them (e.g. Guttmans Folly came home in 67.5 seconds, Galloping Galton was 69.0 seconds, and Cattells Chance was 65.2 seconds). Sometimes we use dichotomies. This basically is a variable that can only take one of two values, either present (1) or absent (0). Its level therefore is nominal. Descriptive Statistics Measures of Central Tendency There are three measures that give an indication of the average value of a data set: Mode - this is the most common value in the data set (most appropriate for nominal level data) Mean - this is the arithmetic average, the one most familiar to people (most appropriate for interval and ratio level data) Median - this is the middle value in the data set (most appropriate for ordinal level data)
As an example, the following are the numbers of children for seven families: 0001157 The mode (most common value) is 0. The mean is calculated as 14 (sum of all seven scores) divided by seven (number of cases), which equals 2
61
The median is 1 (the middle case of the seven cases)
Measures of Dispersion Some measures of dispersion include: Range - this is basically just the difference between the highest and lowest scores, e.g. in the above example of families the range would be 7 minus 0, which is 7. Standard Deviation - this represents an average deviation from the mean, essentially. In this case, it is 2.6. This measure of dispersion is normally calculated through SPSS.
Normal (and Abnormal) Distributions A normal distribution is a reflection of a naturally occurring distribution of values also known as the bell curve, where the mean, median and mode are all equal, e.g. IQ scores. If this is the case, then the researcher is able to make certain assumptions about the population parameters. This assumption enables specific methods of analysis to be used. A normal distribution:
However, normal distributions are something of an ideal. For example, upon examining the earlier data on number of children, we see that the mean, median and mode are not equal. Therefore we can not make assumptions about the population parameters. In other words, non-parametric methods of analysis must be used. Typically, the measures used to represent these kinds of distributions are the median and range, as opposed to mean and standard deviation. Often, the data you will be using will not allow you to make assumptions about the population parameters, so non-parametric methods must be used (more on this later). Frequencies Frequencies represent the number of occurrences for values of a given variable. If, for example, ten participants in an experiment were made up of five males and five females, then the frequencies for the values of 1 (male) and 2 (female) would both be 50%. A frequency score for a given value is the percentage of all the subjects/cases/participants that have that value as a score. There are different forms of
62
frequency counts in SPSS, all of which are detailed in the SPSS section on descriptive statistics. Inferential Statistics
Parametric Statistics
Briefly, traditional statistics are used after conducting an experiment testing a research hypothesis. This hypothesis is about a relationship between the independent and dependent variables. Inferential (i.e. hypothesis testing) statistical methods do this by applying the findings of descriptive statistics discussed earlier. Thus we can infer an aspect of the characteristics of the population from the samples we take of these populations. Take an easy example. Let us try to determine whether or not our group of MSc students are normal with respect to the population of postgraduate students in the UK. We hypothesise that you are not. We obtain scores for normality from the files and observations of your behaviour, and calculate the central tendency and variation of the group. The N rating for this group is 40 with a standard deviation of 10. We know that other postgraduate students have an N rating of 60 with a standard deviation of 20. In terms of probability, calculations would show that the chances of the MScs coming from the normal population is 2%, therefore you are statistically unlikely to come from that population. However, the fun begins when we: 1. try to set up the experimental designs that allow the independent variable to be manipulated to cause a change in the dependent variable, and/or 2. have to estimate the population parameters from the sample(s) because we dont know enough about the population. So, looking at (2) first, estimating population parameters when they are unknown is done using the sample itself. Aha, you think, surely thats got to be wrong because we are testing the sample against a population which is estimated using the sample. Thats where (1) comes in - by performing certain experimental manipulations (random selection, large sample size, etc.) we can ensure that the sample provides an unbiased estimate of the population. If we do this then the error in the sample is minimised, though never eliminated - hence the need for p values. Shall we look at this in more detail? Ways to allow the experimental design to overcome the difficulties in estimating population parameters include, most importantly, random assignment of subjects and random assignment of treatments. This includes levels of treatments as well. The use of experimental controls is also important to ensure that the participants are not biasing the sample in any way, i.e. independent or between groups. Alternatively, and preferably, subjects may act as their own controls, i.e. dependent or within groups. Placebos are an important way to reduce subject error, and experimenter bias. Blind and double-blind experiments are those that take this into consideration. If
63
such things are fully accounted for, then the parameters of the population (i.e. central tendency and variation in scores) are estimated using the sample, and the accuracy of this sample is given by statistical levels of probability. Effect Size and Other Concerns So, weve designed the experiment adequately and weve gathered the data bearing in mind all those things described above. Now we must check that the statistics we perform on the data are capable of rejecting or accepting the hypotheses we proposed. This is the effect size of the manipulation. The risk of falsely accepting the null hypothesis when it is in fact true is traditionally set at 5%, i.e. = 0.05. Traditional statistics is very conservative, and has a morbid fear of rejecting the null hypothesis when it is in fact true. In more applied settings, the ability of a test to be sensitive enough to reject the null hypothesis when it is in fact not true is also important. This level is not usually mentioned, but is implicitly assumed to be 20%, i.e. = 0.20. These levels play a strong part in the effect size, as does the direction of the hypothesised relationship being one or two-tailed. Other influences on effect size include sample size. Often, this is limited by the number of subjects available, though ideally the sample size should be determined by the desired effect size. The other determining influence is statistical test used. For more details on the theories underlying statistics, consult a statistics book, such as Howell. Fundamentals of Statistical Testing All parametric statistical tests have one basic idea in common: each produces a test statistic (t, F, etc.) that is associated with a significance value given the size of the sample. This statistic is a summary of the following ratio: test statistic = amount of systematic variation amount of error variation
Systematic variation comes from the (desirable) effect of the manipulation of the independent variable and error variation comes from the (undesirable) effect of errorridden noise. Hence the larger the error is in sampling, the more powerful the manipulation of independent variable must be to create a significant effect. A sensible way to obtain a good test statistic is to reduce the error in the sample (the denominator in the equation), though many psychologists prefer to have HUGE samples and increase the systematic variation (the numerator in the equation). Which Statistical Test? Parametric inferential tests can be divided based on the design of the experiment, the number of conditions being tested and the number of levels of study. Designs can be of two types between subject and within subject. The former is when you divide subjects into independent groups, such as on the basis of gender, or into one group that receives a drug, and a second that receives a placebo. Within subject
64
designs are when all subjects are subjected to all conditions, e.g. testing reaction times before and after receiving a drug. The number of conditions is merely how many tests you administer for an independent variable. So, in the above example, the between subjects would have two conditions (drug and placebo). The within subjects would also have two (before and after drug). For two conditions, you run a t-test. For three or more, you run an ANOVA. Finally, the design can have multiple levels, e.g. two independent variables of drug and placebo and participant gender, creating four combinations. Different levels can also result in mixed designs. An example could have a between subjects independent variable (gender) and a within subjects IV (the test-retest of reaction times).
Data Level
Nominal Ordinal Ratio/Interval (2 conditions) Ratio/Interval (3 or more conditions)
Design (Between Subjects) Chi Squared Mann-Whitney Unrelated T Unrelated ANOVA
(Within Subjects) Sign Wilcoxon Related T Related ANOVA
Nonparametric Statistics
Unlike parametric statistics, which (as mentioned before) test hypotheses about specific population parameters, nonparametric statistics test hypotheses about such things as similarity of distributions or the measures of central tendency. It is important to note that the assumptions for these tests are weaker than those for parametric tests, so the results are not as powerful. On the other hand, there are a lot of analyses where parametric tests are not particularly appropriate, e.g. situations with very unequal sample sizes. In Investigative Psychology, significant amounts of your data will not be of a nature that lends itself to parametric tests. Data quality and experimental control are not one of our strong points, but this is not a weakness in our research as long as we are aware of the limitations and act accordingly. Nonparametric tests are one of the ways in which we try to deal with our problematic data. Referring back to the previous table, there are 3 basic tests listed (we wont go into Sign here, itll be in most statistics books) - Chi-square, Mann-Whitney and Wilcoxon. In addition, there are ANOVAs for nonparametric testing of more than two conditions.
65
Data Level Nominal Ordinal Ratio/Interval
Design (Between Subjects) (Within Subjects) Chi Square Sign Mann-Whitney Wilcoxon Unrelated T Related T
Chi-squared tests look at associations between variables, while the nonparametric ttests and ANOVAs examine differences in shape or location of the populations. Chi-square Tests Essentially, the CS test uses frequency scores for a variable or variables to determine whether the actual observed frequencies (those that are recorded) are different from those we would expect if there were no differences between the values, in a betweensubjects design. The closer the observed frequencies are to the expected ones, then the lower the value of the Chi-square. If the two are similar enough, this indicates that no significant difference exists between the values. Using the Chi-square with Crosstab Often, the test is used in conjunction with doing a crosstab, which indicates the frequency counts for each combination of values for the two variables. In the table below, there are two variables, both with two values (present/not present). The frequencies of occurrence for each of the four possible combinations of values are listed (e.g. blindfolding and threats to not report co-occurred 10 times).
Threat - No Report
Present Blindfold Present Absent/not recorded 10 5
Absent/not recorded 5 5
A Chi-square might reveal that there is a significant difference between the cells, and examining the table would suggest that the difference lies in how often these behaviours co-occur versus when they occur alone or when they both dont occur. The Mann-Whitney This is the nonparametric equivalent of the independent samples t-test. The major difference is that this test looks at the ranks of the scores, regardless of which value they belong to, for the two distributions, rather than the actual scores. Ranking is a pretty straightforward concept. Looking at the table below, we see that there are four scores for age, one of which would appear to be an extreme outlier and so skews the distribution (making it far from normal). If we rank the scores (listed in brackets
66
beside the actual scores) the rank of age 2 is 4. The scores for age shift, by ranking, from interval to ordinal data and the effect of the extreme outlier is eliminated. Age 1 24 (1) Age 2 78 (4) Age 3 28 (3) Age 4 27 (2)
Score
In the case of the Mann-Whitney, all the scores for both samples are listed together and ranked. If there is a difference between the two distributions, then there will be some sort of significant ordering effect in the ranking (i.e. a significant portion of one of the two samples will make up the lower ranks, rather than a random mix). The null hypothesis of no differences between the two samples will be accepted if there is no significant difference. The actual results will depend on such things as sample sizes, but SPSS will adjust itself accordingly. The Wilcoxon T-test This, unsurprisingly, is the nonparametric equivalent of the dependent samples t-test. Ranks in this case are calculated based on the differences between the two scores for each subject over the two conditions, e.g. if one subject scored 3 acts of aggression before taking speed and 6 after, the difference score would be -3. These differences are then ranked, ignoring the sign, and then the statistics are carried out to identify whether the two conditions differ. Kruskal-Wallis One-Way ANOVA Used, as with the parametric ANOVA, when a variable has more than two levels (independent of each other), the KWANOVA tests for differences using the ranks of the scores rather than the actual scores. Like the ANOVA, the KWANOVA is a test for differences in the averages of the values, but these averages are drawn from the relative ranking, rather than the actual scores. Again, a significant result indicates that differences do exist. As far as I can tell, SPSS does not have a post-hoc test option for KWANOVA, which means youll have to do it by hand. Just find yourself a good statistics book and the information you need should lie within. This ANOVA, and its equivalent measure for related samples, are described in the SPSS section below. Correlations and Associations When trying to determine the relationship between two variables, graphing each case using the scores as x and y co-ordinates can give you something of an initial impression of what associations may be occurring. However, to statistically test the relationship - to see how strong it is, in a sense - you need to determine how correlated they are. In a nutshell, the results will show to what degree the scores of two variables relate to one another. The more they coincide, the stronger the degree of association between the two. In general, the correlation coefficients you will use are appropriate in situations where there is, to some degree, a linear relationship between the two variables. If the relationship is strongly curvilinear (i.e. if you plotted the two variables and the line
67
did a crazy zigzag pattern across the graph), then there are alternatives, which we wont go into here. For most purposes, you will use one of two correlation coefficients - Pearsons Product Moment and Spearmans Rank Order. Deciding between the two is fairly easy. If you are using an Ordinal scale, Spearmans is the one to use. If the variables are interval, and the actual plot of the variables is weakly curvilinear (not a straight line, but generally when x goes up, y goes up, just to varying degrees), you use Spearmans. If the variables are interval, and the graph is linear, then you use Pearsons. You can easily run both at the same time, so you might as well. However, its important to understand which one of the two is more appropriate for your analysis, so that you include the right one in your assignments or dissertations. Pearsons for ratio data, Spearmans for the rest. With all the correlations, you will end up with a score between +1.00 and -1.00. The best way to think of what it means is to split it into two parts. The sign of the coefficient indicates whether the relationship is positive (+) or negative (-). The former means that as x increases in value, so does y. The latter means that as the value of x increases, the value of y decreases (or, as y goes up, x goes down). The size of the coefficient, ignoring the sign, represents how powerful the relationship is. A score of 1.00 (with a + or - sign) would represent a perfect correlation, while a value of, say, +0.85 would be very strong - as x increases, so does y to a roughly equivalent degree. A value of 0.00 would indicate that there is no relationship between the two. Some warnings 1. Remember that the coefficients indicate a degree of association, but not causality. Unless you have strong theoretical reasons to indicate such, you can not clearly state that x influences y. It could be the case that it is y influencing x, or even a third variable could exist, z, that is influencing both. 2. A number of factors can influence a non-ranking correlation: the use of extreme groups, the presence of outliers, combining different groups and curvilinearity. Each of these can lead to inaccurate findings.
Pearsons Product Moment Correlation Coefficient

I wont bore you with detailed descriptions of covariance and such. As usual, if you want a deeper understanding of the inner workings of this procedure, youll have to find a book on it. The value that is of importance to you is the squared correlation coefficient (r2). This indicates how much of the variance in y can be accounted for by x - their common variance, as it were. Note that since r is between 1.00 and -1.00, r 2 is always smaller than r.
68
Spearmans Rank Order Correlation Coefficient

This is basically the nonparametric version of Pearsons, by way of ranking the scores for the two variables, rather than using the raw scores themselves. Interpretation of the results is the same.
Other Measures of Association

1) Dichotomous Data: Jaccards Jaccards is the appropriate measure of association to use for dichotomous variables where mutual non-occurrence does not indicate anything about the relationship between the two variables. This is typically the case in content analysis, e.g. using police records. Yules Q/Guttmans Mu This is the best measure to use if you do know that mutual non-occurrence does indicate something about the relationship between the two variables. There was a tendency last year to automatically run SSAs using Jaccards. This was appropriate most of the time, but there were times when Mu could have been used instead. Keep in mind that Jaccards is the weakest of all possible measures of association - it is used for the type of variables with the least information (dichotomous) and then does not use all the information available from the variables. If you are using materials that arent subject to the problems police records suffer, e.g. if you are doing analyses on a drug abusers personal diaries you know whether the variables are present or not, then use Guttmans Mu. 2) Ordinal Data: Kendalls Tau/Guttmans Mu Use both of these for non-metric analyses (e.g. non-metric SSA). Use the former when you have equal numbers of categories between the two variables, and the latter, which is weaker, when you have unequal numbers. 3) Interval data:
Pearsons for parametric analyses (see above). Alternatively, in SSA you can use Guttmans Mu for non-parametric analyses of interval data.
69
Introductory Module: Statistics & Methods Timetable Module Code: PSYC640

Module Level: MSc No. of Credits: 30 Course Convenor: Dr. Laurence Alison Teaching Staff: Dr. Alison, Prof. Canter, Dr. Lovie, Dr. Downes, Dr.Stott This module is designed to refresh students memories of basic principles of research design and to introduce some basic concepts and ideas to students unfamiliar with such material. It will also prepare students for some of the more complex material to be covered in semester 2 (advanced statistics) as well as introduce them to the range and breadth of qualitative approaches available for research. The central aims of this module are to demonstrate different approaches and perspectives on research design, statistics and qualitative methods and illustrate the benefits of an awareness of these different perspectives. Students should be aware that the majority of their learning should occur outside of the lectures, where approximately 1 hour of lecturing equates to 6 hours independent study time. During these periods students will be required to work on projects, collect materials for research and prepare reading for subsequent lectures. Students should consult with their respective supervisors with regard to project proposals and the onus is on the student to develop a project in consultation with their chosen supervisor. Supervisors available include Dr. Alison, Dr. Lovie and Dr. Stott, although students are encouarged to team up with potential PhD supervisors and other members of staff. Dr. Alison should be the first point of contact in this regard. Lectures start at 11.00 and finish at 12.00 on Wednesday mornings and begin on 2 nd October, 2002. The last lecture in this module will be on 11 th December. Most lectures are followed up by a practical. Practical sessions occur on a Thursday afternoon at 14.00 and last approximately 2 hours. For many of these sessions you will be given data that you can work with outside of the practical session so that you may practice analysing the material and familiarising yourself with SPSS. The prupose of these sessions is to give you hands on experience of working with data, in preparation for your first assignment and as a means by which you can gain some confidence to handle more sophisticated statistical procedures in the advanced statistics module. Assessments (handing in of the assignment and taking the short question and answer exam) will occur in the first two weeks of term, thereby enabling students plenty of time for revision and for completion of their assignments. A short statistics handbook will be provided at the start of the course and students should also take advantage of the web based statistics helper "statistics for the terrified" available through the computer
70
unit on the network. The Dancer and Reidy and Bryman books will be particularly useful as guides for quantitative and qualitative research texts. The aims of the module are as follows: Introduce students to basic principles of experimental design. Introduce the notion of hypothesis testing & behavioural sampling. Familiarise students with descriptive statistics, correlations and associations, nonparametric tests and parametric tests. Introduce the analysis of naturally occurring language and biographical methods. Introduce unobtrusive measurement, content analysis and ethnographic research. Introduce open, semi structured and structured interviewing as well as focus groups.
Students will gain an understanding of the principles underlying both quantitative and qualitative research. They will appreciate how these approaches are not in conflict and how they are associated with issues of gathering material for analysis. They will understand how the approaches may be used in combination or isolation and will have garnered a basic understanding of behavioural observation, ethnography, interviewing skills and content analysis. They will also understand and appreciate the importance of research ethics and issues of confidentiality and debriefing. The module relies on lectures and practical work. Students will be supported by close supervision on a research assignment in which they will be expected to collect their own data (for example, through interviewing, observational methods or analysis of archival records), analyse it and write it up as a written report. Detailed verbal and written feedback will be given on this report. Sessions will also involve the students in the dialogue, by requiring them to prepare material in advance of each lecture and present their perspectives in class. Assessment will be based on an examination in early January (25% of the mark for this module) and an assignment (75% of the mark for this module), due in by the 16 th of January. If either or both components are failed students may resubmit or retake the exam within 2 weeks of original notification of failure. The key text books for the module are: Bryman, A. (2001). Social Research Methods. Oxford University Press. Dancey, C. & Reidy, J. (1999). Statistics Without Maths for Psychology Using SPSS for Windows. Prentice Hall: Essex. Howell, D.C. (2002). Statistical Methods for Psychology. (5th ed). Duxbury. Robson, C. (1993). Real World Research: A Resource for Social Scientists and Practitioner Researchers. Blackwell: Oxford.
71
WEEK 1 (2nd Oct) STRATEGIES AND TACTICS FOR RESEARCH Professor Canter Research in a professional context has special demands. Relevance Timeliness Within Resources, especially time available Robust Explicability
So researchers need to be aware of the full range of strategies and tactics on which they can draw in order to cope with those demands. Each strategy has its own rules and its own way of being good or bad. They also carry with them implications for the form of analysis that is most appropriate. Strategies for research are the overall plan of how the research is organised or designed. Case Studies Relational Studies Natural Experiments Controlled Experiments None of these is to be confused with Consultancy or Action Research which can draw on all of them. Tactics for research are the particular modes of intervention with the subjects of the research. It is here that issues of reliability and validity are most crucial. Qualitative Approaches Qualitative Scales Structured Questionnaires Performance Measures
References Canter,D. (1994) Psychology in Action: Selected Writings of David Canter Dartmouth Especially Chapter 2 The Holistic Organic Researcher Canter,D. (2000) Seven assumptions for an investigative environmental psychology. in S. Wapner, J.Demick, T. Yamamoto, & H.Minami. (eds) Theoretical Perspectives in Environment-Behavior Research: Underlying assumptions, research problems, and methodologies. New York: Plenum pp 191 206 Robson, C. (1993). Real World Research: A Resource for Social Scientists and PractitionerResearchers. Blackwell: Oxford PRACTICAL (3rd October) Signing on to the network, entering data, basic descriptive statistics. (See handbook)
72
WEEK 2 (9th Oct) DIFFERENCES IN DATA I (WILCOXON AND T-TESTS) Dr. Lovie A basic introduction to the experimental approach, with particular emphasis on exploring differences in data. Two sample tests for ordinal (rank) and interval data and for unrelated and related samples (Wilcoxon and t-tests). References Dancey, C. & Reidy, J. (1999). Statistics Without Maths for Psychology Using SPSS for Windows. Prentice Hall: Essex. Howell, D.C. (2002). Statistical Methods for Psychology. (5th ed). Duxbury. PRACTICAL (10th October) Descriptive stats. (See handbook) WEEK 3 (16th Oct) DIFFERENCES IN DATA II (KRUSKAL WALLIS, FRIEDMAN AND ANOVAS) Dr. Lovie Analysis for single factor designs for ordinal (rank) and interval data and for unrelated and related samples (Kruskal-Wallis, Friedman and ANOVAs for independent and repeated measure designs). This session concerns the analysis of differences in data. References Dancey, C. & Reidy, J. (1999). Statistics Without Maths for Psychology Using SPSS for Windows. Prentice Hall: Essex. Howell, D.C. (2002). Statistical Methods for Psychology. (5th ed). Duxbury. PRACTICAL (17th October) T- tests, Wilcoxon and Mann Whitney. (See handbook) WEEK 4 (23rd Oct) DIFFERENCES IN DATA III Dr. Lovie ANOVAs for independent and repeated measure designs. Analyses for multi factor designs for interval data and for unrelated and related samples (ANOVAs for independent and repeated measure designs). This session considers the analysis of differences in data. References Dancey, C. & Reidy, J. (1999). Statistics Without Maths for Psychology Using SPSS for Windows. Prentice Hall: Essex.
73
Howell, D.C. (2002). Statistical Methods for Psychology. (5th ed). Duxbury. PRACTICAL (24th October) ANOVAS. (See handbook) WEEK 5 (30th Oct) STUDY WEEK This time should be devoted to considering assignments and speaking to research supervisors. Students should also use the opportunity to catch up on reading and preparing materials for subsequent weeks lectures. WEEK 6 6th Nov CASE STUDY DESIGN Dr D.Montaldi. This session will consider the principles involved in case study design. The session will discuss problems associated with ethical, practical and theoretical concerns associated with the analysis of the materials. Reference Bryman, A. (2001). Social Research Methods. Oxford University Press. No Practical this week WEEK 7 (13th Nov) QUALITATIVE RESEARCH: AN OVERVIEW Dr. Alison An introduction to qualitative research. This session will outline a brief historical account of qualitative research, the main players in the field and the extent to which it either contributes to or stands in opposition to quantitative measurement. This lecture will also identify the methodological framework and highlight the limits and benefits of unobtrusive or non-reactive measurement. Students will explore how different types of information are used by research psychologists in a way similar to that advocated by methods popular with researchers in the 1960s (for example, Webb, Campbell, Schwartz and Sechrest, 1966) and included the examination of physical traces, archival material and simple observation. Included in this exploration will be an assessment and examination of a variety of studies employing similar methods. The lecture will illustrate how the work developed by Webb et al. (1966) echoes the limitations and benefits within such research activity. Students will be asked to Think of the range of materials available for research Think of the limitations of such materials
74
Consider the distortions that such material may be subject to Over the course of day work on material in preparation for an in house seminar. There will be additional time allocated in Thursdays practical session.
Objectives To introduce students to the concept of unobtrusive measurement. To highlight the benefits and limitations of such material. To demonstrate how physical trace, archival and observational studies can help inform studies of human behaviour.
References Campbell, D. T. (1957). Factors relevant to the validity of experiments in social settings. Psychological Bulletin, 54, 297-312. E. J. Webb, D. T. Campbell, R. D. Schwartz, and L. Sechrest, (1966). Unobtrusive Measures: Non Reactive Research in The Social Sciences. Chicago: Rand McNally and Company. Bryman, A. (2001). Social Research Methods. Oxford University Press. PRACTICAL (14th November) Short seminar preparation. Students will be asked to run an in house seminar and present their work and observations of their chosen non-reactive approaches to research. This practical session should be used as a basis for preparing the seminar. (see handbook) WEEK 8 (20th Nov) Professor David Canter MULTIVARIATE ANALYSES, MDS & SORTING TASKS An introduction to multivariate analyses, with a particular emphasis on sorting tasks. Canter,D. (1994) Psychology in Action: Selected Writings of David Canter Especially the chapter by Canter and Groat. PRACTICAL (21st November) Student Seminar. Students will conduct their seminar on Qualitative Research: An Overview (see handbook) WEEK 9 (27th November) INTERVIEWING AND DISCOURSE ANALYSIS An overview of interviewing methods (structured, semi structured, open, focus groups etc). Dr. Alison will discuss the strengths and weaknesses of each approach through examples, highlight the practical and ethical considerations in conducting interviews and illustrate the potential richness of interview material for discourse analysis. Reference Dartmouth.
75
Bryman, A. (2001). Social Research Methods. Oxford University Press. PRACTICAL (28th November) Dr. Alison Students will be expected to conduct an interview (of approximately 30 minutes) with a participant of their choice on a topic of their choice (to be agreed in advance with Dr. Alison). Students will use this material as the basis for the session on content analysis (week 11) so it is important that the interview is transcribed for analysis in that session. Dr. Alison will be available for consultation. WEEK 10 ETHNOGRAPHIC RESEARCH Dr. Stott (4th December) This lecture will focus on the range of methods available to researchers when adopting an ethnographic approach to their research. It will highlight the complex, unpredictable and sometime dangerous nature of this methodology and discuss the various strategies that are available for researchers to overcome the difficulties of accessing populations and data. The lecture will use existing and ongoing research to relate conceptual issues to practices in the field and demonstrate how 'taking sides' and adopting positions of 'neutrality' in respect to criminal activity are sometimes a necessary component of the research process. Reference: Drury, J & Stott, C. (2001) Bias as a research strategy in participant observation: the case of intergroup conflict. Field Methods. 13, 47-67. PRACTICAL (5th December) Dr. Alison Students will use this time to work with their interview material in preparation for week 11. WEEK 11 CONTENT ANALYSIS Dr. Alison (11th December) Dr. Alison will introduce students to the concept of developing coding frameworks for converting complex, naturally occurring information into data for analysis. Students will have been asked previously to collect some naturally occurring examples of language and discourse and will be asked to construct content analysis dictionaries to examine the material. Issues such as clarity in definitional systems, reliability and theoretical meaning will be considered in this session. Reference Bryman, A. (2001). Social Research Methods. Oxford University Press.
76
PRACTICAL Dr. Alison (12th December) Students will present their findings from the interview material in class. EXAMS & ASSIGNMENTS Assignments to be handed in 16th January. Exams will take place 15th January.
77

Introduction To SPSS - Research Methods and Statistics Handbook

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To SPSS - Research Methods and Statistics Handbook

Uploaded by

Copyright:

Available Formats

RESEARCH METHODS & STATISTICS HANDBOOK

First Term Dr. Alison, Mr. Brent Snook

Computers and Networks

The Three Networks

Registering on Windows 2000

Setting up the ERDC Printer

The UNIX Operating System

The ned editor

WEEK 1: Thursday October 3th Introduction to SPSS

Exploring the Data Editor Window

The variable view sheet

Click on this Sheet

Coding and Entering Data

Section I: Descriptive Statistics & Charts

Graphing your Results

Editing a High Resolution Chart

Conditionally Computing Values

WEEK 3: October 17th T-Tests

Section I: Parametric T-tests (related & unrelated)

T-Test for Related Sample

14 15 16 17 18 19 20 OUTPUT The results appear in three sections

.50 .57 .41 .60 .55 .49 .60

T-Test for Independent Samples

Section II: Non-Parametric T-tests (Wilcoxon - related & MannWhitney - unrelated)

WEEK 4: October 24th ANOVAS

One-way ANOVA for Independent Samples

One-way ANOVA for Related Samples

Kruskall-Wallis ANOVA (KWANOVA Unrelated)

Friedmans - Related ANOVAs

WEEK 5: 30th October Study Week

WEEK 6: November 6th No Practical

WEEK 7: November 14th QUALITATIVE RESEARCH: STUDENT SEMINAR PRESENTATION PREPARATION

SECTION III EXTRA MATERIAL

For this regression, then, the regression equation is

Microsoft Word Exercises

The final version should look something like this:

Styles, Headers/Footers and Page Numbering

Creating Tables in Word

Entering Data and Using Formulae

Doing Calculations in Excel

Observed: Bribe d Yes Result Pass Fail Total 13 2 15

Expected: Bribe d Yes Result Pass Fail 8.8 6.2

APPENDIX BASIC STATISTICS

Data: Its Collection and Its Disposal

The median is 1 (the middle case of the seven cases)

Nominal Ordinal Ratio/Interval (2 conditions) Ratio/Interval (3 or more conditions)

Design (Between Subjects) Chi Squared Mann-Whitney Unrelated T Unrelated ANOVA

(Within Subjects) Sign Wilcoxon Related T Related ANOVA

Data Level Nominal Ordinal Ratio/Interval

Present Blindfold Present Absent/not recorded 10 5

Pearsons Product Moment Correlation Coefficient

Spearmans Rank Order Correlation Coefficient

Other Measures of Association

Introductory Module: Statistics & Methods Timetable Module Code: PSYC640

You might also like