30 views

Uploaded by unseenfootage

- Anomaly Detection in Deep Learning - New York Machine Learning
- Probability Theory and Random Processes_Prof_Shital Thakkar
- XII Probability Assignment Main
- Statistics in Business Research
- sd, var
- binomial_and_poisson(2).docx
- pre7
- NormalDistribution Examples
- Choosing the Right Statistical Test
- Lecture 04 Slides
- Excel Statistical Analysis
- Fundamentals of Quality Control and Improvement
- 07A4BS04-MATHEMATICSFORAEROSPACEENGINEERS
- GN Smith- Probability & Statistics in Civil Engineering.pdf
- Chrony F-1, M-1, Archery, Paintball-Chrony
- 80.Research examples
- Basic Statistics
- Analysis of Floods
- 10
- Proposing a Popular Method for Meteorological Drought Monitoring in the Kabul River Basin, Afghanistan

You are on page 1of 88

James A. Condor Manatee Community College Deanna L. Voehl Indian River State College

to accompany

Sixth Edition by

?? Johnson

?? State University

Contents

Preface 1 2 3 4 5 6 7 8 9 Introduction Organization and Description of Data Descriptive Study of Bivariate Data Probability Probability Distributions The Normal Distribution Variation in Repeated Samples - Sampling Distributions Drawing Inferences from Large Samples Small Sample Inferences for Normal Populations

10 Comparing Two Treatments 11 Regression Analysis I; Simple Linear Regression 12 Regression Analysis II; Multiple Linear Regression and Other Topics

Preface

Statistics is an important field of study, now more so than ever. We are surrounded by statistical information in work and in our everyday lives. Many of us work in professions that require us to understand statistical summaries, and some of us work in areas that require us to produce statistical information. The art of teaching statistics has changed dramatically in recent years, with computational software eliminating the need for many of the previously taught techniques. The answers to many of the complex computations come easier and faster to students with todays calculators performing most of the work in elementary statistical analysis. The challenge to the instructor is to get the student to acquire a greater understanding of what (s)he is doing, with the calculator tending to the details of the computations. The TI-84 Plus calculator, by Texas Instruments, is a leading example of the progress in statistical technology. Texas Instruments has provided us with an advanced device at an affordable price that is capable of powerful statistical work and yet is still easy to use. This text will run through the statistical capabilities of the TI-84 Plus calculator. This text will follow the order of topics presented in Johnsons Statistics, sixth edition, published by John Wiley & Sons, Inc., but should also prove useful with other texts. It will not explain the underlying statistics but instead focus on how best to use the TI-84 Plus calculator in computing them.

Chapter

1

Introduction

Use of Technology

Statistics is a field that deals with sets of data. After the data is collected, it needs to be organized and interpreted. There is a limit to how much of the work can be done effectively without the help of some type of technology. The use of technology, such as a calculator with enhanced statistical functions, can take care of most of the details of our work so that we can spend more time focusing on what we are doing and how to interpret the results. Technology can help us not only to store and manipulate data, but also to visualize what the data is trying to tell us. As we work with a calculator, we will be able to: Enter, revise, and store data. Perform statistical computations on stored data or entered statistics. Draw pictures, based on the data, to help us to understand what useful information can be inferred from that data. Advantages of Using a Calculator There are many good statistical software packages available, such as MINITAB, SAS, and SPSS. Excel also contains many statistical built-in functions, as well as supporting plug-ins for statistical work. Still, for the student starting to learn statistics, its hard to beat the advantages of using a powerful hand-held calculator. It is portable and easy to use in many different work environments. It has battery power that lasts far longer than that of a laptop computer. It is less expensive than a computer. It is less expensive than a statistical software package.

Advantages to Using the TI-84 Plus This calculator manual will focus on how to get the most out of using the TI-84 Plus calculator by Texas Instruments. The TI-83 was first released in 1996, improving upon its predecessors (the TI-81 and TI-82) with the addition of many advanced statistical and financial functions. The TI83 Plus and the TI-84 Plus have essentially the same features as the TI-83, but with increased memory capacity and a few extra statistical features. They are powerful calculators with advanced functions, but at the same time easy to use. Most complicated statistical computations are handled through menus which prompt you for the necessary input. Data entry and revision is handled through a Statistical List Editor that is similar to a spreadsheet in how it is used. Statistical graphs are handled through menus and important parts of the graph can be read by tracing along with the arrow keys. The calculators are built sturdily, and can withstand many falls off of student desks.

This chapter focuses on getting numbers into your calculator and storing them for the organization, interpretation, and analysis part of statistics. When you are not given the necessary statistics to perform calculations, you will need to enter data into the calculator to generate the statistics. We will learn how to do statistical calculations with the calculator in future chapters. Using the Statistical List Editor The Statistical List Editor in the TI-84 Plus calculator provides a convenient way to enter numbers and review them. Numbers from a data set can be stored in a list in the calculator so that we can keep numbers that are related to each other together. Example: Calories Consumed An individual is modifying eating habits and has kept track of calories consumed for the last 10 days, as follows:

1474, 1633, 1686, 1748, 1326, 1112, 1245, 1539, 1220, 1561

If we want to do any sort of analysis on these numbers, we will need to get them into the calculator and keep them together as a group.

To input data or to make changes to an existing set of data values use the Edit function. (number one under the EDIT list). Press the number 1 key or press the ENTER key if 1: is highlighted.

Resetting the Statistical List Editor If the Statistical List Editor does not show the columns labeled as L1, L2, and L3, you can reset the Editor to its default settings by selecting SetUpEditor as follows: Press the STAT key. Press the 5 key. Press the ENTER key.

Once the Editor is set up, return to the Edit function. Entering Data in the Statistical List Editor

Type in the ten calorie counts under the column labeled L1. Press the ENTER key when you are done with one number and ready to move on to the next number.

Type in 1474 and press the ENTER key. Type in 1633 and press the ENTER key. Type in 1686 and press the ENTER key. continue until all the data values have been entered.

Use the up and down arrow keys to go back and forth between the numbers. Try changing the value of one of the entries by typing in a new calorie count. Clearing a List of Data Values After a list of data values is no longer needed, you can delete the values by using one of the following methods: You can highlight each data value and use the DEL key. This method is slow and clears the list one data value at a time. You can highlight the list name, for example L1 at the top of the column, and press the CLEAR key and then press the ENTER key. You can go to the EDIT menu and press the number 4 key to clear list. Press the STAT key. Press the number 4 key. Press the 2nd key and then press the 1 key to get L1.

Entering Lists Directly to the Statistical Editor List The home screen is where you do most of your calculator work that doesnt involve menus. Wherever you are on your calculator, you can always get back to the home screen by pressing the 2nd key and then the MODE key to access the QUIT function. From the home screen, you can enter data into a list by typing it between a set of braces { and }, and separating the numbers by commas: {1474, 1633, 1686, 1748, 1326, 1112, 1245, 1539, 1220, 1561}

Once youve typed the numbers in, you will want to save them. Use the store button STO followed by L1, L2 or any other list. ( L1 L6 are above the 16 keys, using the 2nd key.) When you press the STO key, the screen will display an arrow going to the right. Once youve entered a list from the Statistical List Editor, you can see the list by typing its name. For example, if you stored the calories in L1, typing L1 (2nd key then 1 key) on the home screen will display the lists contents. (You will need to use the left and right arrow keys to see all of the lists contents.) Entering Lists Directly to a Name You can also store data to a name that you create. From the home screen, you can save a list to a name by using the STO key. Moving the Named List to the Statistical List Editor The data is stored in the name CAL, but it is not stored as a list. In order to access CAL from the home screen, a list must be created as follows: Press the STAT key. Press the number 1 key. Press the key to highlight the name at the top of one of the columns. Press the 2nd key and then the DEL key to get to the INS (insert) function. Type in the name of your new list - CAL (note that the A-LOCK is turned on so you do not need to press the ALPHA key before each letter), Press the ENTER key. The numbers that you stored in CAL should now appear in the list. Create a Named List Within the Statistical List Editor The lists L1 through L6 are good places to work with data if you do not need to save the data. If you may need the data later and do not want to accidentally over-write it, you can give the data a name. A list can be named with 15 characters. The first character must be a letter A - Z or the angle symbol theta. The other characters can be a letter, a , or a number 0 - 9. To get letters from the keyboard, press the ALPHA key before each letter. The letters appear above and to the right of most of the keys. If you are typing several letters in a row, press ALOCK (above the ALPHA key), type the letters, and then press the ALPHA key again to release the lock.

To create a new list named BURN within the Statistical List Editor, Press the STAT key. Press the number 1 key. Press the key to highlight the name at the top of one of the columns. Press the 2nd key and then the DEL key to get to the INS (insert) function. Type in the name of your new list (note that the A-LOCK is already turned on for you, so you dont need to press the ALPHA key before each letter), Press the ENTER key. BURN should now appear at the top of a column. The individual also kept track of the number of calories burned by exercising for each of the last 10 days. Enter the following data into the newly named list, BURN.

{128, 37, 440, 128, 258, 486, 325, 171, 0, 529}

Getting the Names of Lists Some of the calculator commands require that you type in the name of a list. If the name of the list is one of L1 L6, then you can type it quickly from the keyboard above the 1 6 keys. If the list has a specific name, you cannot just type the name of a list from the keyboard using the ALPHA key. List names on the TI-84 Plus calculator are distinguished from the names of other variables by a small L to the left of the name. Select LIST (2nd STAT) and use the arrow keys to choose one of the list names. Then press the ENTER key. Choosing Lists to Edit There are several ways to control which lists are displayed in the Statistical List Editor. Press the STAT key and then the number 5 key to get to SetUpEditor and type in the desired list of names, separated by commas. The example shown will configure the Statistical List Editor to display only lists CAL and BURN. Lists L1 L6 will not be displayed. To get back to the default list of names, L1 L6, use command SetUpEditor without any list names.

Remove a Named List Within the Statistical List Editor In the Statistical List Editor, use the arrow keys to move to the name of the list to be removed. Press the DEL key for delete. The list disappears, but the contents of the list have not been deleted. To erase the contents of a list, highlight the name of the list and press the CLEAR button and then the ENTER key. This will leave the list name in the editor and clear its entries. Deleting Lists If you store many lists, programs, etc. on your calculator, you may run out of memory. Go to the MEM menu (above the + key) and select 2:Mem Mgmt/Del.. Select 4: List to see all of the current lists. Move the cursor to the list that you want to delete and press the DEL key, then choose YES.

The TI-84 can perform various mathematical operations on data that are stored in lists. Enter the following data in to Lists L1 and L2. m 2 5 7 10 14 17 20 25 30 f 4 2 9 12 10 7 8 3 1

Press: STAT > 1:Edit > ENTER Enter the m values in to List L1. Enter the f values in to List L2.

Finding the sum of a List The TI-84 can quickly calculate the sum of a List.

Press: 2nd > MODE (QUIT) to return to the Home Screen. To find m, find the sum of List L1. Press: 2nd > STAT (LIST) Use the right arrow key to move the cursor to MATH. Press: 5:sum( Press: 2nd > 1(L1) > ) and press ENTER. Repeat the above steps, using L2, to find f.

Find mf by creating a new List from an existing List. The TI-84 can create a new List from one or more existing Lists. One way to calculate mf is to create a new list by multiplying the corresponding values of m and f. Press: STAT > 1:Edit and press ENTER. Use the right arrow key to move the cursor to the L3 column. Use the up arrow key to move the cursor to the column header, L3. Press: 2nd > 1(L1) > * > 2nd 2(L2) and press ENTER. List L3 now contains the product, mf. To calculate mf, use the Sum command as follows: Press: 2nd > STAT (LIST) > MATH > 5:sum( and press ENTER. Press: 2nd > 1(L3) > ) and press ENTER.

Find mf by using the Sum( command and mathematical operations. An alternative to creating a new List, is to perform the multiplication of m and f within the Sum( command itself. At the HomeScreen, Press: 2nd > STAT > MATH >5:sum( and press ENTER. Press: 2nd > 1 > * > 2nd > 2 > ) and press ENTER.

Find f2 by using the Sum( command and mathematical operations. In a similar fashion: Press: 2nd > STAT > MATH > 5:sum( and press ENTER. Press: 2nd > 2 > x2 > ) and press ENTER.

Chapter

2

Organization and Description of Data

One of the tasks of a statistician is to try to make sense of the data by organizing it. In todays world of technology, you are presented with tables and graphical displays of data on a daily basis. This chapter focuses on ways of organizing data on the TI-84 Plus calculator.

Frequency Distributions

One way to organize data is to group similar values together. We can then count the number of elements in each group. To help us see how the values are distributed, we will split them into intervals or classes, all of the same width, and count how many are in each class. Listing the classes and their frequencies will give us the information needed to create a frequency distribution. The calculator does not create a frequency distribution table, but it will construct a frequency histogram which will display the information needed to create one. Example: Hours of Sleep Example 5, page 32, gives the number of hours of sleep the previous night for 59 students at a large Midwest university. Enter the hours of sleep into a list labeled HOURS. Create a Frequency Histogram The data is grouped in to 5 classes beginning with 4.3 and ending with 10.3. The class width is 1.2. Press the WINDOW key. Set Xmin to 4.3 and Xmax to 10.3. Set Xscl to our class width 1.2.

Now that we have told the calculator how to organize the data into classes, we are ready to set up the frequency histogram. Press the 2nd key. Press the Y= key to get to STAT PLOT. Press the ENTER key. With the flashing cursor over On, Press the ENTER key to turn the plot On. Use the arrow keys to select the histogram. Enter the name of the list HOURS for Xlist (use the LIST menu). Leave Freq at 1, since each data value represents only one point. Now we are ready to draw the frequency histogram.

Press the GRAPH key. Press the TRACE key. Use the arrow keys to move from one bar to the next. The screen will display the frequency in each class (n = 20) and the range of values in each class (min = 6.7; max < 7.9). The histogram provides us with the following information. _Class_ 4.3 - 5.5 5.5 6.7 6.7 7.9 7.9 9.1 9.1 10.3 Summary of How to Create a Histogram. 1. Enter the data into a list. 2. Determine the range of values for your data, as well as your desired class width. 3. Press the WINDOW key and set Xmin, Xmax, Xscl to the range of values and class width. Set Ymin to 0 and Ymax to a value large enough for the tallest box in the histogram. (You may need to revise this.) Frequency 5 15 20 16 3

4. Press the 2nd key and then the Y= key to get to STAT PLOT. Press 4: PlotsOff and press ENTER. This will turn off all of the plots. 5. Return to STAT PLOT and select a plot. Turn the plot On by pressing the ENTER key and select the third figure in the first row (the Histogram). Enter the name of your data list in Xlist, and leave Freq as 1. 6. Press the GRAPH key. Press the TRACE key to display the information needed to create the Frequency Distribution Table. Enter the Frequency Distribution Table into the Statistical Editor Since the class intervals cannot be entered into a List, the midpoint of each class is used. Such a frequency distribution for the example of the number of hours slept would look as follows: _Midpoint_ 4.9 6.1 7.3 8.5 9.7 Frequency 5 15 20 16 3

Create two new lists, MIDPT and FREQ. Enter the above table in to these lists, as shown.

Create a Polygon

The steps to create a Polygon are very similar to those needed to create a histogram. We will use the data stored under the labels MIDPT and FREQ. Press the 2nd key and then the Y= key to get to STATPLOT. Press the number 1 key. Turn Plot1 On. Highlight the xyline in Type (2nd item in the 1st line). Type MIDPT for the Xlist: Type FREQ for the Ylist: Select the square in Mark: The same WINDOW that was used for the histogram is applicable to the Polygon. Another option is to let the calculator determine the correct window by using the Zoom feature, as follows: Press the Zoom Key. Press the number 9 key to choose ZoomStat.

Press the GRAPH Key. Press the TRACE Key to see the data points.

A Relative Frequency Column can be generated from the Frequency Column. Create a new list named RELFR. With the cursor still highlighting the name RELFR, Press 2nd STAT (LIST) and select the List named FREQ. Press the Key. Press 2nd STAT (LIST), move the cursor to MATH. Select 5:sum( and press ENTER. Press 2nd STAT (LIST) and select the List named FREQ. Press the ) Key and press ENTER. The above sequence of commands calculates each value in the RELFR column by dividing the corresponding value in the FREQ column by the sum of all of the values in the FREQ column.

A Percentage Column can be generated from the Relative Frequency Column. Create a new list named PERC. With the cursor still highlighting the name PERC, Press 2nd STAT (LIST) and select the List named RELFR. Press the (Multiplication) Key. Type in 100 and press ENTER. The above sequence of commands calculates each value in the PERC column by multiplying the corresponding value in the RELFR column by 100.

A Cumulative Frequency Column can be generated from the Frequency Column.

Create a new list named CUMFR. With the cursor still highlighting the name CUMFR, Press 2nd STAT (LIST) and move the cursor to OPS Select 6:cumSum( and press ENTER. Press 2nd STAT (LIST) and select the List named FREQ. Press the ) Key and press ENTER. The above sequence of commands calculates each value in the CUMFR column by adding the corresponding value in the FREQ column to the sum of the previous values. The table at the right was displayed by using the SetUpEditor to display just these 3 Lists.

The steps to create an Ogive are very similar to those needed to create a Polygon. We will begin by using the data stored under the labels MIDPT, FREQ, and CUMFR. The Ogive uses the first lower and all of the upper boundaries rather than the midpoint. Thus it does require an additional data value at the beginning of the list. Insert a new row of data: Move the cursor to the top value in the MIDPT list (the 4.9). Press 2nd DEL (INS). A 0 will appear. Repeat the above steps for the FREQ and CUMFR columns.

Overwrite the values in the MIDPT column with the first lower and all of the upper boundaries, as shown at the right.

Press the 2nd key and then the Y= key to get to STATPLOT. Press the number 1 key. Turn Plot1 On. Highlight the xyline in Type (2nd item in the 1st line).

Type MIDPT for the Xlist: Type CUMFR for the Ylist: Select the square in Mark: Some modifications will be needed for the WINDOW to incorporate the lower boundary and the increased y-values.

Press the GRAPH Key. Press the TRACE Key to see the data points.

Chapter

3

Descriptive Study of Bivariate Data

Simple Linear Regression Models

A simple linear regression model is an equation describing how to use one variable, x, to predict another variable, y, based on the relationship existing in the sample data. Since the predictions made from the sample data may differ from the actual values in the population data, the symbol y' is used for the predicted value of y. The simplest possible model is a linear one: y' = a + bx. The graph is a line, where b is the slope of the line and a is the y-coordinate of the y-intercept. Constructing a Scatter Diagram In order to use the TI-84 Plus to find the linear regression model, the sample data must be entered into the STAT editor. The relationship between the two quantitative variables can be viewed in a scatter diagram. You can create a scatter diagram by using the scatterplot option under STAT PLOT, which is located above the Y = key. The steps to create the scatter diagram are similar to those needed to create a histogram. Example: The data in the following table gives the English and Math scores for nine randomly selected students.

A 77 68

B 90 86

C 85 78

D 62 73

E 71 75

F 88 78

G 95 85

H 67 78

I 75 81

It is recommended, at this time, to clear Y = of any graphs. Enter the data in to the STAT editor. Press STAT > 1 to get to the Stat editor. Type the English scores in L1. Type the Math scores in L2. Press 2nd > Y= to get to STAT PLOT. Press 1 to get to Plot1.

Highlight ON and press ENTER. Highlight the scatterplot (1st diagram) under Type and press ENTER. Type L1 in for the Xlist: Type L2 for the Ylist: Select any one of the symbols under Mark: (the square tends to show up the best) Press Zoom > 9 to get to ZoomStat. Press the TRACE key and use the left and right arrow keys to move among the points and see the coordinates of the marks. Note: if any equations are stored on the Y = page, they may also appear in your diagram. Delete them if they are in the way.

Creating a Linear Regression Model The TI-84 Plus calculator has two built-in functions, LinReg(ax+b) and LinReg(a+bx) to compute a simple linear regression model. They are both located on the STAT page in the CALC list. These are two forms of the same function, one that writes the equation as ax+b and the other that writes the equation as a+bx. We will use the latter form, a+bx, but either is okay. We will create a linear regression model for the English and Math scores from the previous example. They are stored in lists L1 and L2. Press STAT > CALC > 8: LinReg(a+bx). Type: L1 > , > L2 > , Press VARS > Y-VARS > 1 > 1 to get Y1. Press ENTER.

The LinReg output shows: general model: y=a+bx y-intercept: a=53.26159596 slope: b=.3135854034 coefficient of determination: r2=.3883069253 correlation coefficient: r=.6231427808

The equation of the linear regression model for the English and Math scores is y = 53.2616 + 0.3136x Coefficient of Determination and Correlation Coefficient Note: In order to see the coefficient of determination, r2, and the correlation coefficient, r, on the LinReg output, the diagnostics must be turned On. Use the TI-84 DiagnosticOn function to turn diagnostics on. The command only needs to be executed once, and from then on r2 and r will be displayed every time you compute a regression model. Press 2nd > 0 to get to CATALOG. **Use the down arrow key until you find the command DiagnosticOn and then pressing ENTER ENTER. **Alternative: By pressing the first letter of the function you are looking for, you can save time getting to the function. The ALPHA key is engaged automatically when you go into CATALOG, so to get to the letter D press the x-1 key. Use the down arrow key to continue down until you find DiagnosticOn and then press ENTER ENTER. Enter the LinReg command again and the output should appear as shown on the left. Graphing the Linear Regression Line The LinReg command we entered requested that the equation of the linear regression model (least squares line) be stored in Y1.

Execute the STAT PLOT command again and the least squares line will appear on the scatter diagram.

Chapter

4

Probability

Generating Random Numbers

When working with probabilities, there is sometimes a need to generate numbers that you cant predict, but at the same time follow some standard rules. Computer simulations are a common example of the need for random occurrences within a structured setting. These numbers are called pseudo-random numbers since they are not totally random. Your calculator can generate these types of random numbers. The numbers that will appear on your calculator screen are hard to predict, but you will be able to attach probabilities to them. For each kind of pseudo-random number, we will be able to say what the probability is that it will occur next. Generating Random Numbers Between 0 and 1 Suppose that you would like to generate a number between 0 and 1. You want the number to be unpredictable, but you want every number between 0 and 1 to have an equally likely chance of being generated. The numbers that are generated with the random number function on your calculator will be very similar to those found in the random number table in an Elementary Statistics textbook. The function is on the MATH page and can found in the PRB list. Press the MATH key. Press the key until PRB is highlighted. Select 1:rand and press ENTER. Press ENTER again. If you continue to press the ENTER key you will generate a different random number between zero and one each time you press the ENTER key.

Generating Random Numbers Between Any Two Values The TI-84 Plus does not have a built-in function to generate random real numbers that are equally likely to occur and fall within a specified range of values, but they can be generated by using the rand function with some additional commands. The following command will generate random real numbers between 1 and 100.

Press the MATH key. Press the key until PRB is highlighted. Select 1:rand and press ENTER. Type: * (100 - 1) + 1 Press ENTER. If you continue to press the ENTER key, you will generate a different random number between 1 and 100 each time. In general, the command used to generate a real number between values m and n is: rand * (n m) + m , where n is the larger number. For example: rand * (10 1) + 1 will generate a real number between 1 and 10. rand * (900 500) + 500 will generate a real number between 500 and 900.

Generating Random Integer Values Between Any Two Numbers To generate random integer numbers (no decimals) that are equally likely to occur and fall within a specified range of values, the TI-84 Plus has the built-in function randInt. The following sequence will generate 20 random integer values between 1 and 100. Select MATH > PRB. Select 5:randInt( . Type: 1, 100, 20) and press ENTER. Use the right arrow key to see the remaining numbers. Press ENTER to generate 20 more such random integers.

In general, the command used to generate n integer numbers between values j and k is: randInt(j, k, n) If n = 1, then you do not need to enter it For example: randInt(10, 50, 15) will generate 15 random integers between 10 and 50. randInt(10, 50) will generate 1 random integer between 10 and 50.

Store Random Numbers in a List The random numbers generated can be stored in a list to be used with other statistical procedures. The command in the screen shot on the right will generate 15 randomly-generated integer values between 10 and 50 and store them in List L1.

Chapter

5

Probability Distributions

Mean and Standard Deviation of a Discrete Random Variable

Computing the mean and standard deviation of a discrete random variable is slightly different than computing the mean and standard deviation of a set of data values. Each data value in the data set weighs equally in the computation. However, in a discrete random variable, the possible data values are given along with the likelihood of each value occurring on any given single trial. As was the case for a set of data values, the TI-84 calculator can be used to calculate the mean and standard deviation of a discrete random variable by either manually using the formulas or by using a built-in function. We will begin with manually using the formulas. Example: Number of Heads Table 1 gives the probability distribution of the number of heads in three tosses of a fair coin. Enter the number of heads into a list named X, and the probability into a list named PROBX.

Mean of a Discrete Random Variable The formula to calculate the mean of a discrete probability distribution is .

Move the cursor to highlight the name of the empty List next to List PROBX. Type: List X * List PROBX

Press ENTER.

The formula now says to sum of this list of values. Go to the homescreen (2nd > Mode). Select 2nd > STAT > MATH > 5: sum( Type: L1) and press ENTER. = 1.5 heads in three tosses of a coin.

Standard Deviation of a Discrete Random Variable The formula to calculate the standard deviation of a discrete probability distribution is

Move the cursor to highlight the name of the empty List next to List PROBX. Type: List X ^ 2 * List PROBX Press ENTER.

The formula now says to take the square root of the difference of the sum of this list of values and the square of the mean. Go to the homescreen (2nd > Mode). Press Key. Select 2nd > STAT > MATH > 5: sum( Type: L1) 1.5 ^ 2) and press ENTER . =0.866 heads in three tosses of a coin.

The TI-84 Plus built-in function 1-Var Stats will also calculate the numerical descriptive statistics for a Discrete Probability Distribution. We will use the same probability distribution as above, which we stored in Lists X and PROBX.

Select 2nd > STAT >X , 2nd > STAT > PROBX

Press ENTER. The screen will display the descriptive statistics, which includes the population mean and standard deviation.

Factorials A common function needed to compute dependent probabilities is the factorial function. The notation for the factorial of n is n! The ! function is found on the MATH page under the PRB list. To find the number of ways six people could be arranged in six different chairs, you would calculate six factorial (6!).

Type: 6 > MATH > PRB > 4: ! and press ENTER. 6! = 720. Calculate 10! and 0!.

Combinations The combination formula can also be used to compute dependent probabilities. The notation for the number of combinations is nCr, where n is the total number of elements, and r is the number being selected. Combinations are used when selecting a few elements from a larger number of distinct elements. Example: Ice Cream An ice cream parlor offers 6 flavors of ice cream. Kristen would like to purchase 2 flavors of ice cream. In how many ways can Kristen choose 2 flavors out of the 6 flavors? In order to find the number of ways of choosing two flavors out of six, we would need to calculate 6C2. Type 6 > MATH > PRB > 3: nCr and press ENTER. Press the number 2 key and press ENTER. There are 15 different combinations of two flavors of ice cream. Calculate 6C3 and 8C3 .

Permutations The permutation formula can be used to compute dependent probabilities. The notation for the number of permutations is nPr, where n is the total number of elements, and r is the number being selected. Permutations are used when trying to find all possible arrangements of elements taken from a larger selection. Arrangements involve putting the elements in a particular order. If Kristens story changes as below, then permutations apply rather than combinations. An ice cream parlor offers 6 flavors of ice cream. Kristen would like to purchase 2 flavors of ice cream and concerned as to which flavor is on the top and which flavor is on the bottom (i.e. Kristen is concerned about the arrangement of the flavors). In how many ways can Kristen arrange 2 flavors out of the 6 flavors? In order to find the number of arrangements of choosing two flavors out of six, we would need to calculate 6P2.

Type 6 > MATH > PRB > 2: nPr and press ENTER. Press the number 2 key and press ENTER. There are 30 different arrangements of two flavors of ice cream. Calculate 6P3 and 8P3 .

Binomial Distribution

Randomly Generating Number of Successes From a Binomial Distribution There are many situations in statistics where you need to generate numbers from distributions where the numbers are not equally likely to occur. One of the most commonly used distributions used in statistics is the discrete Binomial distribution. The TI-84 Plus has a built-in function to generate random real numbers from a specific Binomial distribution. The random real number represents an x value, the number of successes. Select MATH > PRB > 7:randBin( Type 3, 0.3,5) and press ENTER. The screen shot on the left repeated the Binomial experiment 5 times. Each time there were 3 trials with a probability of success were 2, 1, 0, 1, and 2 respectively. Generate 5 random numbers from a Binomial distribution with 3 trials and 0.9 probability of success. The syntax for the randBin( function is randBin(n, p, r). This will generate r random numbers representing x the number of successes from a binomial distribution with n number of trials and p probability of success on a given trial. Note: if r = 1, you may omit it.

Compute Binomial Probabilities The command for computing a probability at x successes for a discrete Binomial distribution is binompdf(. To find the probability of x successes out of n trials, each with probability p of success, type binompdf(n, p, x).

Example: VCRs Suppose that 5% of all VCRs manufactured by an electronics company are defective. Three VCRs are selected at random. What is the probability that exactly one of them is defective? P(x = 1) Select 2nd > VARS (DISTR) > A: binompdf( and press ENTER. Type: 3, 0.05, 1) and press ENTER. The result is 0.135375 or 13.5% chance that exactly one of them is defective. Calculate the same probability with 8 VCRs selected at random, rather than 3. Now there is 27.9% chance that exactly one of them is defective.

Compute Cumulative Binomial Probabilities The command for the probability for a cumulative number of successes from 0 to x for a discrete Binomial distribution with n number of trials and p probability of success on any given single trial is binomcdf( . P(number of successes x) Using the same Binomial distribution of 3 VCRs as above, what is the probability that zero or one of them is defective? P(x 1) Select 2nd > VARS (DISTR) > B: binomcdf( and press ENTER. Type: 3, 0.05, 1) and press ENTER. The result is 0.99275 or 99.3% chance that at most one of them is defective. Calculate the same probability with 8 VCRs selected at random, rather than 3. Now there is 94.3% chance that at most one of them is defective. Compute Poisson Probabilities

The command for computing the probability of x occurrences within a given interval for a discrete Poisson distribution with a mean number of occurrences is poissonpdf(, x). P(number of occurrences = x) Example: Telemarketing Suppose that a household receives, on average, 9.5 telemarketing calls per week. Find the probability that the household receives 6 calls this week. Select 2nd > VARS (DISTR) > C: poissonpdf( and press ENTER. Type: 9.5, 6) and press ENTER. The result is 0.076420796 7.6% chance that the household receives 6 calls this week.

Find the probability that the household receives 10 calls this week. There is 12.4% chance that the household receives 10 calls this week.

Compute Cumulative Binomial Probabilities The command for computing the probability of at most x occurrences (cumulative) within a given interval for a discrete Poisson distribution with a mean number of occurrences is poissoncdf(, x). P(number of occurrences x) Using the same Poisson distribution of Telemarketing calls as above, what is the probability that the household receives at most 6 calls this week? P(x 6)

Select 2nd > VARS (DISTR) > D: poissoncdf( and press ENTER. Type: 9.5, 6) and press ENTER. There is 16.5% chance that the household receives at most 6 calls this week.

Find the probability that the household receives at most 10 calls this week.

There is 64.5% chance that the household receives at most 10 calls this week.

Geometric Probabilities Your calculator can compute probabilities for a geometric random variable with probability of success p using the geometpdf( command, located on the DISTR page. To find the probability of the random variable taking the value x, type geometpdf(p, x). Example: Car Ignition Suppose that a car with a bad starter can be started 90% of the time by turning on the ignition. What is the probability that it will take three tries to get the car started? Type geometpdf(0.9, 3); the answer is 0.9%. Cumulative Geometric Probabilities As with the binomial and cumulative probability functions, there is a cumulative version geometcdf( . It can be used to find the probability that a geometric random variable will take a value of at most x by typing geometcdf(p, x).

Chapter

6

The Normal Distribution

Continuous random variables are used to approximate probabilities where there are many possibilities or an infinite number of possibilities on a given trial. One of the most well-known continuous distributions used to approximate probabilities is the normal distribution. Traditionally normal distribution probabilities were figured using a normal distribution table. The table method is being replaced with calculators such as the TI-84 Plus. The calculator reduces the time needed to perform the calculations and reduces the rounding errors that occur because of the brevity of the tables in elementary statistics textbooks.

Normal Distribution

Randomly Generating a Number From a Normal Distribution Just as the TI-84 had a built-in function to generate random real numbers from a Binomial distribution, it also has a built-in function to generate random real numbers from a specific Normal distribution with a mean and standard deviation . The random real numbers represent x values. The general syntax is randNorm(, , n), where n is the number of random real numbers. The following command will generate 30 numbers from a Normal distribution with a mean of 45 and a standard deviation of 8 and store them in L2. Select MATH > PRB > 6:randNorm( and press ENTER. Type: 45, 8, 30) > STO > L2

Generate 200 numbers from a Normal distribution with = 100 and = 15 and store them in L3.

Generate a histogram of the 200 numbers in L3 and observe that the histogram is beginning to look like a normal distribution. Experiment with generating a larger number of data values.

Computing Normal Distribution Probabilities The commands for the Normal distribution are normalpdf( , normalcdf( , and invNorm( . They are located on the DISTR page. DISTR appears above the VARS key.

Compute Cumulative Normal Probabilities The normalcdf( function stands for normal cumulative density function and gives the probability of getting an x value that falls within an interval of values from the normal distribution. There are three possibilities: Finding the probability that a number will fall between two values under the Normal distribution. Finding the probability that a number will fall to the left of a value under the Normal distribution. Finding the probability that a number will fall to the right of a value under the Normal distribution. The syntax for the normalcdf( function is normalcdf(L, B, , ), where L is the lower bound of the interval, B is the upper bound of the interval, is the mean, and is the standard deviation. The values for and may be omitted if it is the Standard Normal distribution. Finding the Area Between Two Values To find the area between two numbers a and b under the Standard Normal curve, P(a < z < b) = normalcdf(a, b, 0, 1). Find the probability of getting a value between 1.04 and 1.82 under the Standard Normal curve. Select: 2nd > VARS >2: normalcdf( and press ENTER. Type: 1.04, 1.82, 0, 1) and press ENTER. P(1.04 < z < 1.82) = 0.115. Find the probability of getting a value between 0 and 3 under the Standard Normal curve.

Find the probability of getting a value between 10 and 13 under the Normal curve with a mean of 10 and a standard deviation of 2.

P(10 < x < 13) = 0.43 Find the probability of getting a value between 2 and 12 under the Normal curve with a mean of 10 and a standard deviation of 2.

Finding the Area to the Left of a Value To find the area to left of b under the Normal curve, P(z < b) = normalcdf(-, b, , ). The problem is that the TI-84 calculator does not have a built-in key for negative infinity (-). Thus, the value -1E99 is used, which represents a very large negative number. The letter E stands for scientific notation and it is located above the comma (,) key (2nd > ,). Thus, the command will look like: normalcdf(-1E99, b, , ). Find the probability of getting a value less than 0 under the Standard Normal curve. Select: 2nd > VARS >2: normalcdf( and press ENTER. Type: -1 > 2nd > , > 99,0) and press ENTER. P(z < 0) = 0.5. Find the probability of getting a value less than 32.45 under the Normal curve with mean 25 and standard deviation 6.

Finding the Area to the Right of a Value To find the area to right of a under the Normal curve, P(z > a) = normalcdf(a, 1E99, , ). Find the probability of getting a value greater than -1.08 under the Standard Normal curve. Select: 2nd > VARS >2: normalcdf( and press ENTER. Type: -1 > 2nd > , > 99,0) and press ENTER. P(z > -1.08) = 0.8599. Find the probability of getting a value greater than 15.3 under the Normal curve with mean 12 and standard deviation 4.

There are times in statistics when we have a probability and need a relevant z-score or raw score. The problem of this type may look like: P(z > ?) = 0.8599. Such problems are known as inverse normal distribution problems. Such computations can be performed using tables of normal probabilities, but the work is tedious, error-prone, and often has rounding errors. Fortunately, the calculator has a function, invNorm(, that performs the calculation. We know from the previous section that the unknown in P(z > ?) = 0.8599 is -1.08. Select: 2nd > VARS > 3:invNorm( and press ENTER. Type: 0.8599) and press ENTER. The screen is telling us that the answer is positive 1.08. The invNorm( function gives an answer based on a cumulative probability of 0.8599 from - to 1.08. Since the Normal distribution is symmetric, the same cumulative probability applies to -1.08 to . It is always advisable to draw the normal curve to help in visualizing this concept. Graph the Normal Probability Density Function The function normalpdf( stands for Normal probability density function and does not actually generate a probability, since it applies to a single x value in a continuous distribution and that probability is always zero. The main use of this command is to draw the Normal curve. The syntax for the function is normalpdf(x, , ), where is the mean and is the standard deviation. The following sequence of commands will draw the standard normal curve ( =0 and = 1). Select: Y = > 2nd > VARS > 1: normalpdf( and press ENTER. Type: x, 0, 1) > ZOOM > 9

This command may be used to draw any Normal distribution curve with any mean and standard deviation.

Shade the Normal Probability Density Function When calculating the probability of an area under the Normal curve, it is often helpful to shade the area. The syntax for the TI-84 Plus command to do this is ShadeNorm(a, b, , ). Example To shade the area under the Standard Normal curve for P(1.04 < z < 1.82) = 0.115, begin by turning off all other graphs (STATPLOT or Y =). Adjust the WINDOW to view the Standard Normal curve, as shown on the right. Select: 2nd > VARS > DRAW > 1: ShadeNorm( and press ENTER. Type: 1.04, 1.82, 0, 1) and press ENTER. Notice that the area of the shaded region is also shown on the graph and it is the same value calculated from the normalcdf( command. Thus, the ShadeNorm( is an alternative command for normalcdf(, with the added benefit of the shading of the area. Example Find the probability of getting a value greater than 15.3 under the Normal curve with mean 12 and standard deviation 4. Adjust the WINDOW as shown on the right. Type: ShadeNorm(15.3, 1E99, 12, 4) and press ENTER.

Chapter

7

Variation in Repeated Samples Sampling Distributions

A large part of statistics consists of analyzing the probability of getting a specific sample mean or sample proportion from a repeated number of samples drawn from the same population. Usually we focus on two kinds of statistics from those samples: the sample mean , if the data is quantitative, or the sample proportion , if the data is categorical. For large sample sizes, both and have normal distributions, which have already been discussed. The normalcdf( function on the TI-84 Plus calculator will be used, with a slight modification. As before, the answers using normalcdf( function will differ slightly from the answers found from a table of normal probabilities, since the latter involves rounding.

For a large sample size, the Central Limit Theorem states that the sampling distribution is normally distributed with = and = / n . The syntax normalcdf(a, b, , / n ) is used to find the probability that a < < b. The procedure is the same as finding the probability of x with a given mean and standard deviation. As before, if you are finding the area to the left of some value b, we use -1E99 for a. If you are finding the area to the right of some value a, we use 1E99 for b. The key stroke for E is 2nd > comma.

Assume that the weights of all packages of a certain brand of cookies are normally distributed with a mean of 32 ounces and a standard deviation of 0.3 ounces. Find the probability that the mean weight, , of a random sample of 20 packages of this brand of cookies will be less than 31.8 ounces. The sample size here is not large, but the distribution of all such cookies is normal, so the sample mean will be normal as well. Select 2nd > VARS > 2: normalcdf( and press ENTER. Type: -1E99, 31.8, 32, 0.3/ 20 )) and press ENTER. = 0.9986 An alternative approach would be to adjust the WINDOW as shown on the right, and use the ShadeNorm( function. Type: ShadeNorm(-1E99, 31.8, 32, 0.3/ 20 )

For a large sample size, the Central Limit Theorem states that the sampling distribution for is normally distributed with = p and = pq / n To find the probability that a < < b on the calculator, use normalcdf(a, b, p,

pq / n ). Note

that it is more accurate to type pq / n directly into the normalcdf( command than to compute it separately and type it in. Any time you find yourself typing in an intermediate result in a computation you may be performing some unnecessary rounding.

Example: Voters A candidate for mayor in a large city claims that she is favored by 53% of all eligible voters of that city. Assume that the claim is true. What is the probability that in a random sample of 400 registered voters taken from this city, between 49% and 51% will favor the candidate? 2

Select 2nd > VARS > 2: normalcdf( and press ENTER. Type: 0.49, 0.51, 0.53, (0.53 * 0.47 / 400 )) and press ENTER. = 0.1570

In other words, normalcdf(0.49, 0.51, 0.53, .53 * .47 / 400 ) = 0.1570 = 15.7%.

Chapter

8

Drawing Inferences from Large Samples

In statistics, we collect samples to know more about a population. If the sample is representative of the population, the sample mean or proportion should be statistically close to the actual population mean or proportion. A way to judge how close the sample statistic may be, is to create a confidence interval. This chapter will describe how to use the calculator to compute confidence intervals and tests of hypothesis for population means and proportions, drawn from large samples.

The function used to compute confidence intervals for the population mean, , is ZInterval for when is known and the sample is large. It is found on the STAT page under TESTS.

Known Population Standard Deviation If you are fortunate enough to know the population standard deviation , either from theory or from a pilot study, then you would use a Z-based confidence interval, ZInterval, to estimate the population mean, . There are two different syntaxes for the ZInterval command. If you know the statistical information (population standard deviation, sample mean, and sample size), then the syntax is ZInterval , , n, confidence level. If you have the sample data stored in a list, then the syntax is ZInterval , List name, Frequency list, confidence level.

Example: Textbook Price A publishing company has just published a new college textbook. Before the company decides the price at which to sell this textbook, it wants to know the average price of all such textbooks in the market. The research department at the company took a sample of 36 such textbooks and collected information on their prices. This information produced a mean of $70.50 for this sample. It is known that the standard deviation of the prices of all such textbooks is $4.50. Construct a 90% confidence interval for the mean price of all such college textbooks. We have the population standard deviation, , so we will use ZInterval command. We do not have the data itself, so we will select Stats where it asks for input. We enter 4.5 for , 70.5 for , 36 for n, and .90 for C-Level. Press STAT > TESTS > 7: ZInterval and press ENTER. Select Stats by moving the cursor over Stats and press the ENTER key. Type in 4.5 for . Type in 70.5 for . Type in 36 for n. Type in .90 for C-Level. Highlight Calculate and then press the Enter key.

The ZInterval output shows the 90% confidence level, as well as the sample mean and sample size. The confidence interval is between $69.27 and $71.73. With 90% confidence, we believe that the true population mean price is between $69.27 and $71.34.

Example: Randomly Generated Sample Data Generate 50 random numbers from a Normal distribution with a mean of 45 and a standard deviation of 8 and store them in L1.

Select MATH > PRB > 6:randNorm( and press ENTER. Type: 45, 8, 50) > STO > L1 We have the population standard deviation, = 8, so we will use ZInterval command. We have the data, so we will select Data where it asks for input. We enter L1 for List, 1 for Freq, and .90 for C-Level. 2

Press STAT > TESTS > 7: ZInterval and press ENTER. Select Data by moving the cursor over Data and press the ENTER key. Type: 4.5 for . Type: L1 for List. Type: 1 for Freq. Type: 0.90 for C-Level. Highlight Calculate and then press the Enter key.

The ZInterval output shows the 90% confidence level, as well as the sample mean, sample standard deviation, and sample size. The confidence interval is between 45.001 and 47.095. With 90% confidence, we believe that the true population mean price is between 45 and 47.1.

A hypothesis test about a population mean can be Z-based (if is known) or T-based (if is unknown and either the population is normal or the sample size is over 30). The TI-84 Plus calculator provides functions for both the Z-Test and the T-Test, which are located on the STAT page in the TESTS list. The menus for Z-Test and T-Test are very similar to the ones for ZInterval and TInterval described in the last chapter. The functions work with either the data or the summary statistics. Both functions ask for the null hypothesis and an alternative hypothesis. Both functions provide a p-value for comparison with the tests significance level. Example: One Sided Test; Know Sigma H0: = 50; H1: > 50; n = 200; = 52.7; = 16.2; = .05

Since sigma is known, Z-Test will be used. Select: STAT > TESTS . 1: Z-Test and press ENTER. Highlight Stats and press ENTER. Type: 50 for 0. Type: 16.2 for . Type: 52.7 for .

Type: 200 for n. Move the cursor over >0 and press ENTER. Move the cursor over Calculate and press ENTER.

The Z-Test output shows the alternative hypothesis: >50 test statistic: z=2.357022604 p-value: .0092110461 sample mean: =52.7 sample size: n=200

The p-value is .0092110461, which is less than 5%. We reject H0 and conclude that the population mean is statistically-significantly higher than 50. Example: Two Sided Test; Unknown Sigma H0: = 112; H1: 112; n = 85; = 108.5; Sx = 12.4; = .005

Since sigma is not known, T-Test will be used. Select: STAT > TESTS > 2: T-Test and press ENTER. Highlight Stats and press ENTER. Type: 112 for 0. Type: 108.5 for . Type: 12.4 for Sx. Type: 85 for n. Move the cursor over 0 and press ENTER. Move the cursor over Calculate and press ENTER.

The T-Test output shows the alternative hypothesis: 112 test statistic: t=-2.602290774 p-value: .0109440164 sample mean: =108.5 sample size: n=85

The p-value is .0109440164, which is greater than 0.5%. We do not reject H0 and conclude that the population mean is not statistically-significantly different than 112.

In the above examples, the Calculate option for T-Test and Z-Test was chosen. If the Draw option is chosen, the calculator will draw the curve and state the z/t value and the p-value. Select: STAT > TESTS . 2: T-Test and press ENTER. The original information should still be there. Highlight Draw and press ENTER.

The function 1-PropZInt computes Z-based confidence intervals for a population proportion when the sample size is large enough, i.e., the number of successes, x, is greater than 5 and the number of failures, n x, is greater than 5. 1-PropZInt is found on the STAT page under TESTS. To use 1-PropZInt, enter in the number of successes as x, the sample size as n, and the confidence level as C-Level. Note: x must be a whole number. If you are finding x by multiplying by n, you will need to round to the nearest whole number.

Example: Legal Advice A recent sample of 500 college students revealed that 82% of them owned a graphing calculator. Find a 95% confidence interval for the percentage of all college students who own a graphing calculator. In our sample of 500 college students, there were 82% or 410 successes and 90 failures. We can use 1-PropZInt with x = 410, n = 500, and our C-Level set at 0.95.

Press STAT > TESTS > A: 1-PropZInt and press ENTER. Type: 410 for x. Type: 500 for n. Type: 0.59 for C-Level. Highlight Calculate and press the ENTER key.

The 1-PropZInt output shows the 95% confidence interval, the sample proportion, and the sample size. The 95% confidence interval is from 0.786 to 0.854. With 95% confidence, we believe that the true population proportion is between 78.6% and 85.4%

Hypothesis tests about proportions are Z-based when both np and nq are greater than 5. The TI-84 Plus has function 1-PropZTest to compute a test statistic and p-value. The 1-PropZTest is located on the STAT page under the TESTS list. The menu for 1-PropZTest is very similar to the one for 1-PropZInt described in the last chapter. Again you need to enter x, the number of successes, and n, the sample size. You also need to enter p0, the number that p is being compared with, and the alternative hypothesis. Example: One Sided Test H0: p = .41; H1: p < .41; n = 300; x = 97

Select: STAT > TESTS > 5: 1-PropZTest and press ENTER. Type: .41 for p0. Type: 97 for x. Type in 300 for n. Highlight <P0 and press ENTER. Highlight Calculate and press ENTER.

The 1-PropZTest output shows the alternative hypothesis: prop < .41 test statistic: z=-3.052072083 p-value: p =.0011364066 sample proportion: =.323 sample size: n=300

In the above example, the Calculate option for PropZ-Test was chosen. If the Draw option is chosen, the calculator will draw the curve and state the z value and the p-value. Select: STAT > TESTS > 5: 1-PropZTest and press ENTER. The original information should still be there. Highlight Draw and press ENTER.

Chapter

9

Small Sample Inferences For Normal Populations

Confidence Interval for Mean Small Sample Size

The population standard deviation is usually not known. When this is the case, the sample mean has a t-distribution, rather than a Normal distribution. Thus, a t-based confidence interval, TInterval, is used to estimate the population mean, . Another consideration is that either the population is normal or the sample size is larger than 30. Just as for the ZInterval command, there are two different syntaxes for the TInterval command. If you know the statistical information (sample mean, sample standard deviation, and sample size), then the syntax is TInterval , , n, confidence level. If you have the sample data stored in a list, then the syntax is TInterval List name, Frequency list, confidence level.

Example: Household Debt A local orange grove sells oranges at Saturdays Downtown Farmers Market. They wanted to estimate the average number of oranges sold on a given Saturday. They took a sample of 35 Saturdays and found that the average number of oranges sold for this sample is 256 with a standard deviation of 40. Construct a 99% confidence interval for the population mean . We do not have the population standard deviation, , so the TInterval command will be used. Press STAT > TESTS > 8: TInterval and press ENTER.

We do not have the sample data, so we will select Stats, and enter 256 for , 40 for sx, 35 for n, and .99 for the C-Level.

Select Stats for Inpt: and press the ENTER key. Type in 256 for . Type in 40 for Sx. Type in 35 for n. Type in .99 for C-Level. Highlight Calculate and press the ENTER key.

The TInterval output shows the 99% confidence interval along with the sample mean, sample standard deviation, and sample size. The 99% confidence interval is from 237.55 to 274.45. With 99% confidence, we believe that the true population mean number of oranges sold is between 237.6 and 274.5 oranges.

A hypothesis test about a population mean is T-based if is unknown and either the population is normal or the sample size is over 30. The TI-84 Plus calculator provides the T-Test, which is located on the STAT page in the TESTS list. The T-Test function works with either the data or the summary statistics, requesting null hypothesis and an alternative hypothesis. A p-value is provided for comparison with the tests significance level. Example: H0: = 112; H1: 112; n = 85; = 108.5; Sx = 12.4; = .005

Since sigma is not known, T-Test will be used. Select: STAT > TESTS > 2: T-Test and press ENTER. Highlight Stats and press ENTER. Type: 112 for 0. Type: 108.5 for . Type: 12.4 for Sx. Type: 85 for n.

Move the cursor over 0 and press ENTER. Move the cursor over Calculate and press ENTER.

The T-Test output shows the alternative hypothesis: 112 test statistic: t=-2.602290774 p-value: .0109440164 sample mean: =108.5 sample size: n=85

The p-value is .0109440164, which is greater than 0.5%. We do not reject H0 and conclude that the population mean is not statistically-significantly different than 112. In the above examples, the Calculate option for T-Test and Z-Test was chosen. If the Draw option is chosen, the calculator will draw the curve and state the z/t value and the p-value. Select: STAT > TESTS . 2: T-Test and press ENTER. The original information should still be there. Highlight Draw and press ENTER.

Chi-Square Tests

Computing Chi-Square Distribution Probabilities The computation commands for the Chi-Square distribution are 2pdf( , and 2cdf( . They are located on the DISTR page. DISTR appears above the VARS key.

Compute Cumulative Chi-square Probabilities The 2cdf( function stands for chi-square cumulative density function and gives the probability of getting an x value that falls within an interval of values from the chi-square distribution for the specified degrees of freedom. There are three possibilities: Finding the probability that a number will fall between two values under the Chi-square distribution.

Finding the probability that a number will fall to the right of a value under the Chi-square distribution (in the right tail). Finding the probability that a number will fall to the left of a value under the Chi-square distribution (in the left tail). The syntax for the 2cdf( function is 2cdf(L, B, df), where L is the lower bound of the interval, B is the upper bound of the interval, and df is the degrees of freedom. Finding the Area Between Two Values To find the area between two numbers a and b under the Chi-square curve, P(a < 2 < b) = 2cdf(a, b, df). Find the probability of getting a value between 5.14 and 7.28 under the Chi-square curve with 8 degrees of freedom. Select: 2nd > VARS > 8: 2cdf( and press ENTER. Type: 5.14, 7.28, 8) and press ENTER. P(5.14 < 2 < 7.28) = 0.236. Finding the Area in the Left Tail To find the area to the left of b (in the left tail) under the Chi-square curve, use P(2 < b) = 2cdf(-, b, df). The TI-84 calculator does not have a built-in key for negative infinity (-). Thus, the value -1E99 is used, which represents a very large negative number. The letter E stands for scientific notation and it is located above the comma (,) key (2nd > ,). Thus, the command will look like: 2cdf(-1E99, b, df). Find the probability of getting a value less than 20 under the Chi-square curve with 17 degrees of freedom. Select: 2nd > VARS >8: 2cdf( and press ENTER. Type: -1 > 2nd > , > 99,20,17) and press ENTER. P(2 < 20) = 0.7258

Finding the Area in the Right Tail To find the area to right of a (in the right tail) under the Chi-square curve, use P(2 > a) = 2cdf(a, 1E99, df). Find the probability of getting a value greater than 31.08 under the Standard Chi-square curve with 25 degrees of freedom. Select: 2nd > VARS >8: 2cdf( and press ENTER. Type: 31.08 > , > 1 > 2nd > , 99,25) and press ENTER.

Graph the Chi-square Probability Density Function The function 2pdf( stands for Chi-square probability density function and does not actually generate a probability, since it applies to a single x value in a continuous distribution and that probability is always zero. The main use of this command is to draw the Chi-square curve. The syntax for the function is 2pdf(x, df), where df is the degrees of freedom. The following sequence of commands will draw the chi-square curve with 7 degrees of freedom. Select: Y = > 2nd > VARS > 7: 2pdf( and press ENTER. Type: x, 7) > ZOOM > 0

This command may be used to draw any Chi-square distribution curve with any degrees of freedom. It may be necessary to adjust the window. For example, changing Xmax to 30 gives a better Chi-square distribution curve (see below).

Shade the Chi-square Probability Density Function When calculating the probability of an area under the Chisquare curve, it is often helpful to shade the area. The syntax for the TI-84 Plus command to do this is Shade 2 (a, b, df). To shade the area under the Standard Chi-square curve

with 8 degrees of freedom for P(5.14 < 2 < 7.28) = 0.236, begin by turning off all other graphs (STATPLOT or Y =). Adjust the WINDOW to view the Standard Chi-square curve with 8 degrees of freedom, as shown on the left. Select: 2nd > VARS > DRAW > ENTER. Type: 5.14, 7.28, 8) and press 3: Shade 2 ( and press ENTER. Notice that the area of the shaded region is also shown on the graph and it is the same value calculated from the 2cdf( command. Thus, the ShadeNorm( is an alternative command for 2cdf(, with the added benefit of the shading of the area.

A Goodness-of-Fit Test

A goodness-of-fit test is used to make a test of hypothesis about experiments with more than two possible outcomes (or categories). These are called multinomial experiments. The frequencies of each possible outcome obtained from the experiment are called the observed frequencies. A goodness-offit test tests the difference between the observed frequencies and the expected frequencies (npi). This difference follows a chi-square distribution. The sample size should be large enough so that the expected frequency for each category is at least 5. The TI-84 Plus command for the goodness-of-fit test is 2GOF-Test, which performs a test to confirm that sample data is from a population that conforms to a specified distribution (expected frequencies). The command requires that both the observed and expected frequencies are put in lists. The degrees of freedom, df, is the number of outcomes minus 1. The command is located on the DISTR page. DISTR appears above the VARS key. Example 1: The following table lists the frequency distribution of 90 rolls of a die. Test at the 5% significance level whether the null hypothesis that the given die is fair is true.

Enter the observed frequencies into List L1 and the expected frequencies into List L2, as shown on the left. There are 6 possible outcomes, so the degrees of freedom, df, is 6 1 = 5. Select: STAT > TESTS > D: 2GOF-Test and press ENTER. Type in 2nd > 1 for Observed. Type in 2nd > 2 for Expected. Type in 5 for df: Highlight Calculate and press ENTER. The output for the 2GOF-Test shows the: Chi-square value: 2 = 3.7333 p-value = 0.5884 Degrees of freedom: df = 5 CNTRB = {.066666667, 2.4, 0, .6, .066666667} Note: CNTRB= provides a list of the contributions of each category to the overall value of 2. This would be the values in the roll of 2 had the largest contribution. column. Notice that a

Since the p-value is greater than 5%, reject the null hypothesis and conclude that the dice is not fair.

Contingency Tables

When measuring the relationship between two categorical variables, one of the most important tools for analyzing the results is a two-way classification table, also known as a contingency table. A Test of Independence The contingency table can be used to see if the variables are independent, by comparing observed frequencies with the frequencies that would be expected from such a sample if they were independent. A Chi-Square (2) test-statistic can be computed from the observed and expected frequencies. The TI84 function 2-Test is used for the Test of Independence and is located on the STAT page in the TESTS list. The 2-Test function works differently than the other tests on this menu. It

requires you to enter the observed frequencies into a matrix. The computed expected frequencies are then stored automatically in another matrix. Example: School Referendum A random sample of In Favor Against No Opinion 300 adults was selected Men 93 70 12 and asked if they were Women 87 32 6 in favor of the new school referendum. The two-way classification table of the responses of the adults is presented in the table. Test at a 5% significance level whether or not Gender and Opinion are independent. First the data has to be stored differently than in previous statistical tests on the calculator. The data for a Chi-Square test of independence has to be stored in a Matrix. Type: 2nd > x-1 to get to MATRIX. Highlight EDIT and press the ENTER key.

After MATRIX [A] type in 2 x3. The 2 represents the number of rows in the table. The 3 represents the number of columns in the table. Enter the data values into the matrix as they appear in the table.

Select: STAT > TESTS > C: 2-Test and press ENTER. The 2-Test screen shows: That the Observed values are in matrix A That the Expected values will be put in matrix B Highlight Calculate and press ENTER. The output for the 2-Test shows the: Test statistic: 2 =8.252773109 P-value: p =.0161410986 Degrees of freedom: df=2

Since the p-value of .016 is less than the significance level of .05, we reject the null hypothesis that row and column variables are independent. We have significant evidence that gender and opinion concerning the school referendum are related for all adults. The Expected frequencies can be found in Matrix B. Type: 2nd > x-1 to get to MATRIX. Select: 2: [B] and press the ENTER key. Press ENTER again.

In the above example, the Calculate option was chosen. If the Draw option is chosen, the calculator will draw the curve and state the 2 value and the p-value. Select: STAT > TESTS > C: 2-Test and press ENTER. The original information should still be there. Highlight Draw and press ENTER.

A Test of Homogeneity A test of homogeneity is a test to determine if two (or more) populations are homogeneous (similar) with regard to the distribution of a certain characteristic. The procedure to perform this test on the TI-84 Plus is identical to performing a Test Of Independence (please see above).

Inferences About the Population Variance In the same way that the population mean and population proportion are tested, so is the population variance. This is often in response to a desire to control the consistency of a value. If the population from which the sample is taken is approximately normally distributed, then the sample variance has a chi-square distribution with n - 1 degrees of freedom. The TI-84 Plus does not have a built-in function to generate confidence intervals about the population variance. The Goodness Of Fit function, 2GOF-Test, can be

used for a test of hypothesis of the population variance if the data

Chapter

10

Comparing Two Treatments

Confidence Interval for 1 - 2

There are two functions used to compute confidence intervals for the difference of two population means 1 2: 2-SampZInt for when both 1 and 2 are known and the sample sizes are large or the populations from which the samples are drawn are normal 2-SampTInt for when 1 and 2 are not known. Both are found on the STAT page under TESTS. Known Population Standard Deviations If you are fortunate enough to have information about the population standard deviations of the two populations, either from theory or a pilot study, then you would use a Z-based confidence interval, 2-SampZInt, to estimate the difference 1 2 between two population means. As was the case for estimating one population mean, ZInterval, there are two different syntaxes for the 2-SampZInt command. If you know the statistical information for both populations (population standard deviations, sample means, and sample sizes), then the syntax is 2-SampZInt 1, 2 , , n1, , n2, confidence level. If you have the sample data stored in two lists, then the syntax is 2-SampZInt 1, 2, List name1, List name2, Frequency list1, Frequency list2, confidence level. Example 1: The following information is obtained from two independent samples selected from two populations. Construct a 90% confidence interval for 1 2. n1 = 200 = 6.4 1 = 0.7 n2 = 190 = 5.6 2 = 0.55

We have the population standard deviations, 1 and 2, so we will use 2-SampZInt command. We do not have the data itself, so we will select Stats where it asks for input. We enter 0.7 for 1, 0.55 for 2, 6.4 for 1, 200 for n1, 5.6 for 2, 190 for n2, and 0.90 for C-Level.

Press STAT > TESTS > 9: 2-SampZInt and press ENTER. Select Stats by moving the cursor over Stats and press the ENTER key. Type in 0.7 for 1. Type in 0.55 for 2. Type in 6.4 for 1. Type in 200 for n1. Type in 5.6 for 2. Type in 190 for n2. Type in .90 for C-Level. Highlight Calculate and then press the Enter key.

The 2-SampZInt output shows the 90% confidence interval for 1 2, as well as both sample means and both sample sizes. The confidence interval is between 0.69543 and 0.90458. With 90% confidence, we believe that the true difference between the population means is between 0.69543 and 0.90458. If you have the actual data rather than the statistics, then enter the data in two lists and choose Data rather than Stats.

Independent Samples With Unknown But Equal Population Standard Deviations In the real world, one rarely knows the population standard deviations. Thus, the approach to estimating the difference 1 2 between two population means is to compute a T-based confidence interval, 2-SampTInt, with the assumption that the population standard deviations are equal. The pooled sample standard deviation for the two samples will be used. As was the case for the 2-SampZInt command, there are two different syntaxes for the 2-SampTInt command.

If you know the statistical information for both samples (sample standard deviations, sample means, and sample sizes), then the syntax is 2-SampTInt , Sx1, n1, , Sx2, n2, Confidence Level, Pooled. If you have the sample data stored in two lists, then the syntax is 2-SampTInt List name1, List name2, Frequency list1, Frequency list2, Confidence Level, Pooled. Example 2: The following information is obtained from two independent samples selected from two populations with unknown but equal standard deviations. Construct a 95% confidence interval for 1 2. n1 = 42 = 78.4 s1 = 10.13 n2 = 39 = 75.2 s2 = 9.55 We have the sample standard deviations, s1 and s2, so we will use 2-SampTInt command. We do not have the data itself, so we will select Stats where it asks for input. We enter 78.4 for 1, 10.13 for sx1, 42 for n1, 75.2 for 2, 9.55 for sx2, 39 for n2, 0.95 for C-Level. We have reason to believe that the population standard deviations are the same, so select Yes by the prompt Pooled. Press STAT > TESTS > 0: 2-SampTInt and press ENTER. Select Stats by moving the cursor over Stats and press the ENTER key. Type in 78.4 for 1. Type in 10.13 for Sx1. Type in 42 for n1. Type in 75.2 for 2. Type in 9.55 for Sx2. Type in 39 for n2. Type in .95 for C-Level. Highlight Yes for Pooled. Highlight Calculate and then press the Enter key. The 2-SampTInt output shows the 95% confidence interval for 1 2, as well as both sample means and both sample standard deviations. It also shows the degrees of freedom (n1 + n2 2). The confidence interval is between -1.162 and 7.5622. With 95% confidence, we believe that the true difference between the population means is between -1.162 and 7.5622. If you have the actual data rather than the statistics, then enter the data in two lists and choose Data rather than Stats.

Hypothesis Testing: 1 - 2

Known Population Standard Deviations If you are fortunate enough to have information about the population standard deviations of the two populations, either from theory or a pilot study, then you would use Z-based, 2-SampZTest, to perform a test of hypothesis about 1 2. Again, there are two different syntaxes for the 2SampZTest command. If you know the statistical information for both populations (population standard deviations, sample means, and sample sizes), then the syntax is 2-SampZTest 1, 2 , , n1, , n2,1: If you have the sample data stored in two lists, then the syntax is 2-SampZTest 1, 2, List name1, List name2, Frequency list1, Frequency list2, 1: Example 1 (from above): The following information is obtained from two independent samples selected from two populations. Test at the 5% significance level if the two population means are different, 1 2. n1 = 200 = 6.4 1 = 0.7 n2 = 190 = 5.6 2 = 0.55 Press STAT > TESTS > 3: 2-SampZTest and press ENTER. Select Stats by moving the cursor over Stats and press the ENTER key. Type in 0.7 for 1. Type in 0.55 for 2. Type in 6.4 for 1. Type in 200 for n1. Type in 5.6 for 2. Type in 190 for n2. Highlight 2 for 1: Highlight Calculate and then press the Enter key. The 2-SampZ-Test output shows the alternative hypothesis: 1 2 test statistic: z=12.58305739 p-value: 2.707293E-36 sample means: 1=6.4; 2=5.6 sample sizes: n1=200; n2=190 4

The p-value is 2.707293E-36, which is less than 5%. We reject H0 and conclude that the population means are statistically-significantly different. In the above example, the Calculate option was chosen. If the Draw option is chosen, the calculator will draw the curve and state the z value and the p-value. Press STAT > TESTS > 3: 2-SampZTest and press ENTER. The original information should still be there. Highlight Draw and press ENTER.

If you have the actual data rather than the statistics, then enter the data in two lists and choose Data rather than Stats.

Independent Samples With Unknown But Equal Population Standard Deviations In the real world, one rarely knows the population standard deviations. Thus, t-based, 2-SampTTest, is used to perform a test of hypothesis about 1 2,with the assumption that the population standard deviations are equal. The pooled sample standard deviation for the two samples will be used. Again, there are two different syntaxes for the 2-SampTTest command. If you know the statistical information for both samples (sample standard deviations, sample means, and sample sizes), then the syntax is 2-SampTTest , Sx1, n1, , Sx2, n2, 1:, Pooled. If you have the sample data stored in two lists, then the syntax is 2-SampTTest List name1, List name2, Frequency list1, Frequency list2, 1:, Pooled.

Example 2 (from above): The following information is obtained from two independent samples selected from two populations with unknown but equal standard deviations. Test at the 5% significance level if 1 > 2. n1 = 42 = 78.4 s1 = 10.13 n2 = 39 = 75.2 s2 = 9.55 Press STAT > TESTS > 4: 2-SampTTest and press ENTER. Select Stats by moving the cursor over Stats and press the ENTER key. Type in 78.4 for 1. Type in 10.13 for Sx1. Type in 42 for n1. Type in 75.2 for 2. Type in 9.55 for Sx2. Type in 39 for n2. Highlight > 2 for 1: Highlight Yes for Pooled. Highlight Calculate and then press the Enter key. The 2-SampTTest output shows the: alternative hypothesis: 1 > 2 test statistic: t=1.460144062 p-value: 0.0741080013 degrees of freedom: 79 sample means: 1=78.4; 2=75.2 sample standard deviations: Sx1=10.13; Sx2=9.55 pooled sample standard deviation: Sxp=9.85527418 sample sizes: n1=42; n2=39 The p-value is 0.0741080013, which is greater than 5%. We do not reject H0 and conclude that population mean1 is not statistically-significantly greater than population mean2. In the above example, the Calculate option was chosen. If the Draw option is chosen, the calculator will draw the curve and state the z value and the p-value. Press STAT > TESTS > 4: 2-SampTTest and press ENTER. The original information should still be there. Highlight Draw and press ENTER.

If you have the actual data rather than the statistics, then enter the data in two lists and choose Data rather than Stats.

Independent Samples With Unknown and Unequal Population Standard Deviations In the case of unequal population standard deviations, use the same procedures above for equal population standard deviations, with the one exception of choosing No for Pooled.

Paired Samples

The confidence intervals and hypothesis tests described above all assume that the samples are taken independently. Two samples that are taken from the same population are said to be dependent. The most common data collection design with dependent samples is called Pretest/Posttest. Data is collected from a sample before some type of treatment and then data is collected again from that same sample after the treatment. We then work with the mean difference between the pre- and posttest scores. The null hypothesis is that the average difference is zero. By working with the differences between the variables, we can perform a one-sample T-Test (it is rarely the case that the population standard deviations are known), using the TInterval command. Example 3: Find the following confidence intervals for d assuming that the population of paired differences are normally distributed at the 99% confidence level. n = 16 = 21.7 sd = 10.4

Press STAT > TESTS > 8: TInterval and press ENTER. We do not have the sample data, so we will select Stats.

Select Stats for Inpt: and press the ENTER key. Type in 21.7 for (this is ). Type in 10.4 for Sx (this is sd). Type in 16 for n. Type in .99 for C-Level. Highlight Calculate and press the ENTER key. 7

The TInterval output shows the 99% confidence interval along with the sample mean, sample standard deviation, and sample size. The 99% confidence interval is from 14.039 to 29.361. We can state with 99% confidence that the mean difference between the pre- and posttest is between 14.039 and 29.361.

Large and Independent Samples If you have two large and independent samples, then you would use a Z-based confidence interval, 2-PropZInt, to estimate the difference p1 p2 of two population proportions. The function is located on the STAT page in the TEST list. For population proportions, a large sample size is defined as n1p1, n1q1, n2p2, n2q2 are all greater than 5. Example 4: The following information is obtained from two large and independent samples selected from two populations. Construct a 95% confidence interval for p1 p2. n1 = 200 p1 = 0.42 n2 = 220 p2 = 0.35 The 2-PropZInt command requests x1 rather than p1. To find x1, use the formula x1 = (200)(0.42) = 84 x2 = (220)(0.35) = 77 .

Press STAT > TESTS > B: 2-PropZInt and press ENTER. Type in 84 for x1. Type in 200 for n1. Type in 77 for x2. Type in 220 for n2. Type in .95 for C-Level. Highlight Calculate and then press the Enter key. The 2-SampZInt output shows the 95% confidence interval for p1 p2, as well as both sample proportions and both sample sizes.

The confidence interval is between -0.023 and 0.116301. With 95% confidence, we believe that the true difference between the population porportions is between -0.023 and 0.116301.

Hypothesis Testing: p1 - p2

If you have two large and independent samples, then you would use Z-based, 2-PropZTest, to perform a test of hypothesis about p1 p2. The function is located on the STAT page in the TEST list. For population proportions, a large sample size is defined as n1p1, n1q1, n2p2, n2q2 are all greater than 5. Example 4 (from above): The following information is obtained from two large and independent samples selected from two populations. Test at the 1% significance level if the two population proportions are different, p1 p2. n1 = 200 p1 = 0.42 n2 = 220 p2 = 0.35 The 2-PropZTest command requests x1 rather than p1. To find x1, use the formula x1 = (200)(0.42) = 84 x2 = (220)(0.35) = 77 .

Press STAT > TESTS > 6: 2-PropZTest and press ENTER. Type in 84 for x1. Type in 200 for n1. Type in 77 for x2. Type in 220 for n2. Highlight p2 for p1: Highlight Calculate and then press the Enter key. The 2-PropZTest output shows the alternative hypothesis: p1 p2 test statistic: z=1.47 p-value: 0.14 sample proportions: =0.42; =0.35 pooled sample proportion: ; =0.383 sample sizes: n1=200; n2=220 The p-value is 0.14, which is greater than 1%. We reject H0 and conclude that the population means are not statistically-significantly different.

In the above example, the Calculate option was chosen. If the Draw option is chosen, the calculator will draw the curve and state the z value and the p-value. Press STAT > TESTS > 6: 2-PropZTest and press ENTER. The original information should still be there. Highlight Draw and press ENTER.

10

Chapter

11

Regression Analysis I Simple Linear Regression

Simple Linear Regression Models

A simple linear regression model is an equation describing how to use one variable, x, to predict another variable, y, based on the relationship existing in the sample data. Since the predictions made from the sample data may differ from the actual values in the population data, the symbol y' is used for the predicted value of y. The simplest possible model is a linear one: y' = a + bx. The graph is a line, where b is the slope of the line and a is the y-coordinate of the y-intercept. Creating a Linear Regression Model The TI-84 Plus calculator has two built-in functions, LinReg(ax+b) and LinReg(a+bx) to compute a simple linear regression model. They are both located on the STAT page in the CALC list. These are two forms of the same function, one that writes the equation as ax+b and the other that writes the equation as a+bx. We will use the latter form, a+bx, but either is okay. We will create a linear regression model for the English and Math scores from the previous example. They are stored in lists L1 and L2. Press STAT > CALC > 8: LinReg(a+bx). Type: L1 > , > L2 > , Press VARS > Y-VARS > 1 > 1 to get Y1. Press ENTER.

The LinReg output shows: general model: y=a+bx y-intercept: a=53.26159596 slope: b=.3135854034 coefficient of determination: r2=.3883069253 correlation coefficient: r=.6231427808

The equation of the linear regression model for the English and Math scores is y = 53.2616 + 0.3136x Graphing the Linear Regression Line The LinReg command we entered requested that the equation of the linear regression model (least squares line) be stored in Y1.

Execute the STAT PLOT command again and the least squares line will appear on the scatter diagram.

Confidence Interval for B A goal for determining the regression line is to find the true value of the slope B of the population regression line. The slope, b, of the regression line for the sample is a point estimate of the slope, B, of the regression line for the population. A different sample would give a different b value. Thus, b is a random variable and has a t distribution. The LinRegTInt function can be used to construct the confidence interval for B, based on b. The LinRegTInt command is located on the STAT page in the TESTS list. The command does require that the data be stored in two lists. Example Using the data in the English and Math scores example above, construct a 95% confidence interval for B. Press STAT > TESTS > G:LinRegTInt and press ENTER.

Type L1 in for the Xlist: Type L2 for the Ylist: Let the Freq: remain at 1. Type .95 for C-Level: Highlight Calculate and press ENTER.

The LinRegTInt output shows: general model: y=a+bx confidence interval: (-.0382, .66535) slope: b=.3135854034 degrees of freedom: df=7 sample standard deviation: s=4.729745193 y-intercept: a=53.26159596 coefficient of determination: r2=.3883069253 correlation coefficient: r=.6231427808

The 95% confidence interval for the slope of the English and Math scores example is (-.0382, .66535).

Hypothesis Tests The LinRegTTest can be used to test whether or not the variable x can meaningfully predict the variable y. This is equivalent to testing whether or not B, the population slope coefficient for the model (approximated by b), is really 0. LinRegTTest is located on the STAT page in the TESTS list. The test requires the names of the lists containing the data values and what the alternative hypothesis is. Example: Test at the 1% significance level whether the slope of the regression line for the above example on English and Math scores is positive.

Press the STAT > TESTS > F:LinRegTTest and press ENTER. Type in L1 for the Xlist: Type in L2 for the Ylist: Let the Freq: remain at 1. Move the cursor over >0 and press ENTER. Set the RegEQ: to Y1. Highlight Calculate and press ENTER. The value of t is 2.108 and the p-value is .0365, which is greater than our significance level of 1%. We fail to reject the null hypothesis and conclude that the slope is not significantly greater than zero. The regression equation was stored in Y1. This equation (linear regression model) can be used to find y' for a given value of x, such as x = 75. Type Y1(75) and press ENTER. To find the predicted Math score for someone with an English score of 60, type Y1(60) to get a predicted Math score of 72.08.

We have already seen how to test for the equality of means between two different populations with 2-SampZTest and 2-SampTTest. With the added assumption of common population standard deviations, we can extend the tests to more than two populations with a technique known as Analysis of Variance (ANOVA for short). In a one-way ANOVA, only one factor or variable is being tested. The null hypothesis is that means of three or more populations are equal; the alternative hypothesis is that at least two of the means differ. The TI-84 built-in ANOVA function is on the STAT page in the TESTS list. First store the sample data into lists, one per population.

The syntax for the ANOVA function is: ANOVA(List1, List2, List3, . . .List20). There is a minimum of 2 lists and a maximum of 20 lists. Use the list names in which the data is stored. The result contains the F test statistic, degrees of freedom, various sums of squares, mean sums of squares, and most importantly the p-value for the test.

Example: Fourth Grade Arithmetic With Equal Sample Sizes Fifteen fourth-grade students were exposed to one of three different methods of teaching arithmetic. They were randomly assigned to three groups of five. At the end of the semester, the same test was given to all 15 students. The following table gives the scores of the students in the three groups. Method Method Method 1 2 3 48 55 84 73 85 68 51 70 95 65 69 74 87 90 67

Assume that the three populations are normally distributed with equal standard deviations. Calculate the value of the test statistic F. At the 1% significance level, can we reject the null hypothesis that the mean arithmetic score of all fourth-grade students taught by each of these three methods is the same?

Press STAT > ENTER key to get to the Stat Editor. Enter the Method I data values into L1. Enter the Method II data values into L2. Enter the Method III data values into L3.

Press the STAT > TESTS > H: ANOVA and press ENTER. Type in: L1, L2, L3) and press ENTER.

The one-way ANOVA table would appear as follows: Source of Variation Between (Factor) Within (Error) Total Degrees of Freedom 2 12 14 Sum of Squares 432.1333 2372.8 2804.9333 Mean Squares 216.0667 197.7333 Value of the Test Statistic F= = 1.0927 p = 0.3665

The df for Between is the number of populations minus 1: k -1 = 3 1 = 2. The df for Within is the number of data items minus number of populations: n - k = 15 3 = 12. The p-value is 0.366. Since this is greater than the significance level of 1%, we fail to reject the null hypothesis. There is insufficient evidence from the data to show that the different teaching methods have significantly different average results.

Example: Bank Tellers With Unequal Sample Sizes From time to time, unknown to its employees, the research department at Post Bank observes various employees for their work productivity. Recently this department wanted to check whether the four tellers at a branch of this bank serve, on average, the same number of customers per hour. The research manager observed each of the four tellers for a certain number of hours. The following table gives the number of customers served by the four tellers during each of the observed hours. Teller A Teller B Teller C Teller D 19 14 11 24 21 16 14 19 26 14 21 21 24 13 13 26 18 17 16 20 13 18 At a 5% level of significance, test the null hypothesis that the mean number of customers served per hour by each of the four tellers is the same. Assume all the assumptions required to apply the one-way ANOVA procedure hold true. Press STAT > ENTER key to get to the Stat Editor. Enter the Teller A data values into L1. Enter the Teller B data values into L2. Enter the Teller C data values into L3. Enter the Teller D data values into L4.

Press the STAT > TESTS > H: ANOVA and press ENTER. Type in: L1, L2, L3, L4) and press ENTER.

The one-way ANOVA table would appear as follows: Source of Variation Between (Factor) Within (Error) Total Degrees of Freedom 3 18 21 Sum of Squares 255.6182 158.2 413.8182 Mean Squares 85.2061 8.7889 Value of the Test Statistic F= = 9.6947 p = 0.0005

The df for Between: k -1 = 4 1 = 3. The df for Within: n - k = 22 4 = 18. The p-value is 0.0005. Since this is less than the significance level of 5%, we reject the null hypothesis. There is significant evidence here to show that the tellers job performances are not all the same.

Chapter

12

Regression Analysis II Multiple Linear Regression and Other Topics

The TI-84 Plus calculator does not have multiple regression functions built directly into it. The calculator can aid in solving multiple regression problems by using the formulas found in chapter 12.

Chapter

13

Analysis of Categorical Data

A Goodness-of-Fit Test

A goodness-of-fit test is used to make a test of hypothesis about experiments with more than two possible outcomes (or categories). These are called multinomial experiments. The frequencies of each possible outcome obtained from the experiment are called the observed frequencies. A goodness-offit test tests the difference between the observed frequencies and the expected frequencies (npi). This difference follows a chi-square distribution. The sample size should be large enough so that the expected frequency for each category is at least 5. The TI-84 Plus command for the goodness-of-fit test is 2GOF-Test, which performs a test to confirm that sample data is from a population that conforms to a specified distribution (expected frequencies). The command requires that both the observed and expected frequencies are put in lists. The degrees of freedom, df, is the number of outcomes minus 1. The command is located on the DISTR page. DISTR appears above the VARS key. Example 1:

The following table lists the frequency distribution of 90 rolls of a die. Test at the 5% significance level whether the null hypothesis that the given die is fair is true. Enter the observed frequencies into List L1 and the expected frequencies into List L2, as shown on the left.

There are 6 possible outcomes, so the degrees of freedom, df, is 6 1 = 5. Select: STAT > TESTS > D: 2GOF-Test and press ENTER. Type in 2nd > 1 for Observed. Type in 2nd > 2 for Expected. Type in 5 for df: Highlight Calculate and press

Chi-square value: 2 = 3.7333 p-value = 0.5884 Degrees of freedom: df = 5 CNTRB = {.066666667, 2.4, 0, .6, .066666667} Note: CNTRB= provides a list of the contributions of each category to the overall value of 2. This would be the values in the roll of 2 had the largest contribution. column. Notice that a

Since the p-value is greater than 5%, reject the null hypothesis and conclude that the dice is not fair.

Contingency Tables

When measuring the relationship between two categorical variables, one of the most important tools for analyzing the results is a two-way classification table, also known as a contingency table. A Test of Independence The contingency table can be used to see if the variables are independent, by comparing observed frequencies with the frequencies that would be expected from such a sample if they were independent. A Chi-Square (2) test-statistic can be computed from the observed and expected frequencies. The TI84 function 2-Test is used for the Test of Independence and is located on the STAT page in the TESTS list. The 2-Test function works differently than the other tests on this menu. It requires you to enter the observed frequencies into a matrix. The computed expected frequencies are then stored automatically in another matrix. Example: School Referendum In Favor Against No Opinion

A random sample of 300 adults was selected and asked if they were in favor of the new school referendum. The two-way classification table of the responses of the adults is presented in the table.

Men Women

93 87

70 32

12 6

Test at a 5% significance level whether or not Gender and Opinion are independent. First the data has to be stored differently than in previous statistical tests on the calculator. The data for a Chi-Square test of independence has to be stored in a Matrix. Type: 2nd > x-1 to get to MATRIX. Highlight EDIT and press the ENTER key.

After MATRIX [A] type in 2 x3. The 2 represents the number of rows in the table. The 3 represents the number of columns in the table. Enter the data values into the matrix as they appear in the table.

Select: STAT > TESTS > C: 2-Test and press ENTER. The 2-Test screen shows: That the Observed values are in matrix A That the Expected values will be put in matrix B Highlight Calculate and press ENTER. The output for the 2-Test shows the: Test statistic: 2 =8.252773109 P-value: p =.0161410986 Degrees of freedom: df=2

Since the p-value of .016 is less than the significance level of .05, we reject the null hypothesis that row and column variables are independent. We have significant evidence that gender and opinion concerning the school referendum are related for all adults. The Expected frequencies can be found in Matrix B.

Type: 2nd > x-1 to get to MATRIX. Select: 2: [B] and press the ENTER key. Press ENTER again.

In the above example, the Calculate option was chosen. If the Draw option is chosen, the calculator will draw the curve and state the 2 value and the p-value. Select: STAT > TESTS > C: 2-Test and press ENTER. The original information should still be there. Highlight Draw and press ENTER.

A Test of Homogeneity A test of homogeneity is a test to determine if two (or more) populations are homogeneous (similar) with regard to the distribution of a certain characteristic. The procedure to perform this test on the TI-84 Plus is identical to performing a Test Of Independence (please see above).

Chapter

14

Analysis of Variance (ANOVA)

One-Way Analysis of Variance

We have already seen how to test for the equality of means between two different populations with 2-SampZTest and 2-SampTTest. With the added assumption of common population standard deviations, we can extend the tests to more than two populations with a technique known as Analysis of Variance (ANOVA for short). In a one-way ANOVA, only one factor or variable is being tested. The null hypothesis is that means of three or more populations are equal; the alternative hypothesis is that at least two of the means differ. The TI-84 built-in ANOVA function is on the STAT page in the TESTS list. First store the sample data into lists, one per population.

The syntax for the ANOVA function is: ANOVA(List1, List2, List3, . . .List20). There is a minimum of 2 lists and a maximum of 20 lists. Use the list names in which the data is stored. The result contains the F test statistic, degrees of freedom, various sums of squares, mean sums of squares, and most importantly the p-value for the test. Method Method Method 1 2 3 Example: Fourth Grade Arithmetic With Equal Sample Sizes 48 55 84 73 85 68 Fifteen fourth-grade students were exposed to one of three 51 70 95 different methods of teaching arithmetic. They were randomly 65 69 74 assigned to three groups of five. At the end of the semester, the same test was given to all 15 students. The following table gives 87 90 67 the scores of the students in the three groups.

Assume that the three populations are normally distributed with equal standard deviations. Calculate the value of the test statistic F. At the 1% significance level, can we reject the null hypothesis that the mean arithmetic score of all fourth-grade students taught by each of these three methods is the same?

Press STAT > ENTER key to get to the Stat Editor. Enter the Method I data values into L1. Enter the Method II data values into L2. Enter the Method III data values into L3.

Press the STAT > TESTS > H: ANOVA and press ENTER. Type in: L1, L2, L3) and press ENTER.

The one-way ANOVA table would appear as follows: Source of Variation Between (Factor) Within (Error) Total Degrees of Freedom 2 12 14 Sum of Squares 432.1333 2372.8 2804.9333 Mean Squares 216.0667 197.7333 Value of the Test Statistic F= = 1.0927 p = 0.3665

The df for Between is the number of populations minus 1: k -1 = 3 1 = 2. The df for Within is the number of data items minus number of populations: n - k = 15 3 = 12.

The p-value is 0.366. Since this is greater than the significance level of 1%, we fail to reject the null hypothesis. There is insufficient evidence from the data to show that the different teaching methods have significantly different average results.

Example: Bank Tellers With Unequal Sample Sizes From time to time, unknown to its employees, the research department at Post Bank observes various employees for their work productivity. Recently this department wanted to check whether the four tellers at a branch of this bank serve, on average, the same number of customers per hour. The research manager observed each of the four tellers for a certain number of hours. The following table gives the number of customers served by the four tellers during each of the observed hours. Teller A Teller B Teller C Teller D 19 14 11 24 21 16 14 19 26 14 21 21 24 13 13 26 18 17 16 20 13 18 At a 5% level of significance, test the null hypothesis that the mean number of customers served per hour by each of the four tellers is the same. Assume all the assumptions required to apply the one-way ANOVA procedure hold true. Press STAT > ENTER key to get to the Stat Editor. Enter the Teller A data values into L1. Enter the Teller B data values into L2. Enter the Teller C data values into L3. Enter the Teller D data values into L4.

Press the STAT > TESTS > H: ANOVA and press ENTER. Type in: L1, L2, L3, L4) and press ENTER.

The one-way ANOVA table would appear as follows: Source of Variation Between (Factor) Within (Error) Total Degrees of Freedom 3 18 21 Sum of Squares 255.6182 158.2 413.8182 Mean Squares 85.2061 8.7889 Value of the Test Statistic F= = 9.6947 p = 0.0005

The df for Between: k -1 = 4 1 = 3. The df for Within: n - k = 22 4 = 18. The p-value is 0.0005. Since this is less than the significance level of 5%, we reject the null hypothesis. There is significant evidence here to show that the tellers job performances are not all the same.

Chapter

15

Nonparametric Inference

The TI-84 Plus calculator does not have nonparametric functions built directly into it. However, the calculator can be used to solve problems involving nonparametric statistical methods. The following examples show how to use the TI-84 plus calculator to compute the numbers needed for the nonparametric: sign test paired-sample sign test Wilcoxon Rank Sum test for two independent samples.

Sign Test

The Sign Test is a nonparametric test and it can be used even if nothing is known about the continuous population distribution. The Sign Test can test claims about the median of the population, because any sample measurement stands a 50% chance of being above the median and a 50% chance of being below the median. Sample data can be treated as a binomial experiment, with successes occurring when the measurement is above the median. Example: Median Price of Homes (Small Sample) A new car salesman states that the median price of cars in a small town is $37,000. A sample of 10 cars selected by a statistician produced the following data on the prices in dollars.

Car

10

Price 47,500 23,600 37,000 68,200 29,450 42,00 56,400 88,210 98,425 15,300

Using a 5% significance level, can we conclude that the median price differs from $37,000? The hypotheses are: H0: the median price is $37,000 H1: the median price differs from $37,000 Since one of the data items is the median, $37,000, it will be ignored. Thus, n = 9. The sample prices are replaced by signs: Plus (+) if the price is above the hypothetical median of $37,000 Minus (-) if the price is below $37,000.

Car Price

1 +

2 -

3 ignored

4 +

5 -

6 +

7 +

8 +

9 +

10 -

We have six +s out of nine signs. Since it is a two-tailed test, the p-value will be twice the probability of getting six or more successes out of nine trials in a binomial experiment where p = 0.5. The binomcdf(n, p, x) function calculates the probability from 0 to x and it can be used to calculate this probability. The binomcdf( function is located at DISTR. Thus, n = 9, p = 0.5, and x = 6. p-value = 2 P(X 6) = 2(1 P(X 5)) = 2(1 binomcdf(9, 0.5, 5)) = 0.5078125 Since the p-value of 0.5078 is greater than our significance level of 5%, we fail to reject the null hypothesis. There is not enough evidence to conclude that the median differs from $37,000.

Given paired data, the Sign Test can be used to test to see if the medians of their respective populations are the same. If the medians are the same, then there is a 50% chance for one measurement of a pair to be larger than the other. Each pair can be replaced by a sign indicating whether the first or the second measurement is larger. Example: Blood Pressure (Small Sample) A researcher wants to find the effect of a special diet on systolic blood pressure in adults. She selected a sample of 12 adults and put them on this dietary plan for three months. The following table gives the systolic blood pressure of each adult before and after the completion of the plan.

A 210 196

B 185 192

C 215 204

D 198 193

E 187 181

F 225 233

G 234 208

H 217 211

I 212 190

J 191 186

K 226 218

L 238 236

Using a 2.5% significance level, can we conclude that the dietary plan reduces the median systolic blood pressure of adults? Our hypotheses are: H0: the diet does not reduce the median systolic blood pressure of adults H1: the diet does reduce the median systolic blood pressure of adults We begin by adding a plus or minus sign indicating whether the Before or After pressure is larger. Plus (+) if the Before is larger Minus (-) if the After is larger

A 215 186 +

B 195 196 -

C 200 185 +

D 198 183 +

E 177 161 +

F 225 203 +

G 244 208 +

H 217 200 +

I 212 189 +

J 191 189 +

K 226 221 +

L 228 240 -

There are ten +s out of twelve signs. Since it is a right-tailed test, the p-value will be the probability of getting ten or more successes out of twelve trials in a binomial experiment where p = 0.5. Thus, n = 12, p = 0.5, and x = 10. p-value = P(X 10) = 1 P(X 9) = 1 binomcdf(12, 0.5, 9) = 0.019 Since the p-value is less than the significance level 2.5%, we reject the null hypothesis. The data leads us to conclude that the diet does lower the median systolic blood pressure of adults.

- Anomaly Detection in Deep Learning - New York Machine LearningUploaded byArief Prihantoro
- Probability Theory and Random Processes_Prof_Shital ThakkarUploaded byErika Peralta
- XII Probability Assignment MainUploaded byCRPF School
- Statistics in Business ResearchUploaded byDipayan_lu
- sd, varUploaded bychowyiklung
- binomial_and_poisson(2).docxUploaded byJenny Robles
- pre7Uploaded byJessica Angelina
- NormalDistribution ExamplesUploaded byDeniz Aras
- Choosing the Right Statistical TestUploaded byraywood
- Lecture 04 SlidesUploaded byAbdu Abdoulaye
- Excel Statistical AnalysisUploaded byVinay Bansal
- Fundamentals of Quality Control and ImprovementUploaded byEngr Zubair Ahmed
- 07A4BS04-MATHEMATICSFORAEROSPACEENGINEERSUploaded byaditya56
- GN Smith- Probability & Statistics in Civil Engineering.pdfUploaded bycheewingyuen
- Chrony F-1, M-1, Archery, Paintball-ChronyUploaded byÁdám Major
- 80.Research examplesUploaded byalexisgarefalakis
- Basic StatisticsUploaded byskp23
- Analysis of FloodsUploaded byKedara V Bhadrudu Vujji
- 10Uploaded byFauzan Hadisyahputra
- Proposing a Popular Method for Meteorological Drought Monitoring in the Kabul River Basin, AfghanistanUploaded byIJAERS JOURNAL
- FindingStatisticsFromGroupedFrequencyTable.pptxUploaded bydanielle
- Examinations - Tables WatermarkedUploaded byLaura Martínez
- 267276414 STA301 Final Term Solved Subjective With Reference by MoaazUploaded byRaza Malick
- Lecture 3Uploaded byPrasad
- RPP BWM20502Uploaded byAzri Arif Zainal Arif
- Jane Musengya-Assignment 2Uploaded byJane Muli
- Mean VarianceUploaded byAhmed M T
- CVPR2010: grouplet: a structured image representation for recognizing human and object interactionsUploaded byzukun
- Module 1 Final Wrap UpUploaded byTiffani
- StatsUploaded byPratik Sharma

- Role of Energy in Fuelling or Resolving Global ConflictUploaded byunseenfootage
- RP878 Radiant Heating Book From Climate Master 108 PagesUploaded byJames De Loach
- TA Handbook 3 EngUploaded byradiopascalge
- Return of The Pharaoh.pdfUploaded byعثمان عبد الله
- A Chapter on The Dispraise of Desire - Ibn Al-Qayyem - فصل في ذم الهوىUploaded byyounusabdullahmuhammad
- Linear AlgebraUploaded byalterock07

- TRIP report presentation -- February 6, 2013Uploaded byMN Senate Transportation & Public Safety Committee & Finance Division
- Complete EDM Handbook_10.pdfUploaded byds_srinivas
- AN EFFECTIVE PROFESSIONAL DEVELOPMENT SERIES IMPROVES THE QUALITY OF TEACHERCHILD INTERACTIONSUploaded byjournalije
- 3315001-3300 PKE Engine Start ManualUploaded byWiliam Trovo
- Urban EcosystemUploaded byCorina Crazylilkid
- Mathematical Analysis of Electronic Commerce Architecture Using Queuing Theory-Dr Riktesh SrivastavaUploaded byDr Riktesh Srivastava
- Car Battery Charger Circuit DiagramUploaded byJai
- Court Deposition of Drew DeBerryUploaded byHank Gilbert
- Program code for BCAUploaded bySourav Roy
- 1917 Facts for ShareholdersUploaded byFernando Rossi
- Free DashUploaded byChristian Hovsep Karadaghlian
- New Microsoft Office Word DocumentUploaded byShaswata Tripathy
- E5_332Uploaded byFarshin Salehi
- Capacitive Power Transfer for Contactless ChargingUploaded byiMiklae
- Cisco 7936 Phone GuideUploaded byDmitry
- Bioaccumulation in Tissues of Fresh Water Fish Cirrhina Mrigala on ChronicUploaded byESSENCE - International Journal for Environmental Rehabilitation and Conservaion
- AA Bard 5.0 TonRUploaded byCarlos Gudiño
- abcd.pdfUploaded bychrntnc
- Cell Management(5G RAN2.1_Draft a)Uploaded byMohammed Mokhtar
- Drainage SystemUploaded byJêx SÕlänki
- Financial InnovationsUploaded byPrasadVM
- Tinoco CaseUploaded byandrew estimo
- Request and Response HandlerUploaded byAnanya Vastav
- Bank Audit Manual-10Uploaded bynikhil92007
- Maintenance, replacement, and reliabilityUploaded byapi-3732848
- Mobile Banking SekUploaded byLoveday Osiagor
- Harden UpUploaded bymara9121
- Training Schedule September - December 2014 Pt. FjmUploaded byIlham_lahiya25
- Adidas BCG 2010 FinalUploaded byMuhamad Affandi Mohd Sith
- a17-carpineto.pdfUploaded byAnca Boloșteanu