You are on page 1of 16

Running head: JFRIESEN ASSIGNMENT #1

APSY 605: Assignment #1 Jo Friesen University of Calgary

JFRIESEN ASSIGNMENT #1 Introduction This data set contains 504 participants, and contains information across twelve variables. Prior to discussing the specific data involved in this study, some general information will be provided. Throughout the presentation, a number of graphs have been used to represent the data. This allows a quick look at the nature of the data, and gives information regarding the shape and the spread of the distribution, and can highlight any potential outliers. These graphs, along with the other presented descriptive statistics, provide a summary of the data, and a means to consolidate the information contained within the data set in a succinct, meaningful way. This information can be important in determining the usefulness and appropriateness of our data in regards to further analysis. Measures of central tendency are an important component of descriptive statistics. They provide a way to represent the typical score in a group of data, which also allows for analysis of how certain scores vary from what is typical. There are three main measures of central tendency: the mode, the median and the mean. The mode is the score that appears most frequently in the data set and is used for nominal variables. For example, you would use mode if students were asked to give their favourite subject at school. The mean is the most common measure of central tendency and is the arithmetic average of the scores. It is used with interval or ratio variables, and is useful if we want to compare variables, or generalize our data. For example, if two typical groups of students (e.g. Grade 12 students in two different high schools) take the same test and we wanted to compare their results, the mean would be the best measure of central tendency to use. The final measure, median, is a better measure to use if there are extreme scores and/or if the distribution is highly skewed, as it is not impacted by extreme scores. The median is the score at which 50 % of the scores are above and 50% of scores are below. If we go back to our

JFRIESEN ASSIGNMENT #1 example of test scores, if it was discovered that there were some extreme outliers, perhaps due to a special gifted class at one of the schools, it would be more appropriate to use the median as a measure of central tendency. The median would also be appropriate if the test was far too difficult, leaving a positively skewed distribution (a high concentration of low scores), with a longer tail of higher scores. Variance is another important component of descriptive statistics. Variance describes the amount of spread in the data. This is important as it tells you whether the data is relatively homogenous or if it is heterogeneous, and it can provide understanding on individual scores compare to the average score. There are a number of measures of variability, including range, interquartile range and standard deviation. The range gives the upper and lower limits of the data set, the quartiles represent how much of the sample falls into each of four equal groups, and the standard deviation is a measure of how far from the mean each individual score is. There are two situations in which the variability of the sample is zero. One, in a situation, such as a case study, where there is only one piece of data for that variable (n=1). Two, if all of the data have the same score on the variable in question (e.g., the entire class gets 10 out of 10 on a quiz). A final component of descriptive statistics is the shape of the distribution. Understanding the shape of each distribution is important so that we can understand how the data behaves (symmetrical or asymmetrical, flat or peaked, does it follow a normal curve) and what inferential statistical tests are appropriate to use. Two important aspects of shape are the skew and kurtosis. Skew is related to the symmetry of a distribution, with a negative skew representing data where the majority of the data points are concentrated above the mean (or to the right) with a tail that is longer on the left. Data that is positively skewed has a longer tail towards the right, or upper end of the continuum, and the majority of the data fall to the left of the mean. Kurtosis describes the

JFRIESEN ASSIGNMENT #1 peak of the data and how it relates to what would be seen with a normal distribution. A peak that follows the normal distribution is called mesokurtic, one that is higher and thinner than the normal distribution is called leptokurtic, and one that is lower and flatter is called platykurtic. Understanding the shape of the distribution is important as certain decisions regarding what statistical tests are appropriate for a set of data depend on the shape of the data. If a statistical test operates on the assumption of a normal distribution, an analysis of skew and kurtosis can help to determine if this assumption is accurate and whether certain tests can be used. Descriptive Statistics by Variable Region Region represents the area of residence for each of the participants, of which there are 28 categories. The mode for this group is 14 Greater Vancouver Region, which represents 194 of the 504 respondents, with a second peak for 03 Capital Regional District, with 56 respondents. This means that 49.6 % of participants came from just two regions, leaving us with a relatively platykurtic (flat) distribution with two peaks (bi-modal). The range for this data is [1, 194], with four regions being represented with only one participant.

JFRIESEN ASSIGNMENT #1 Community Size Community size is divided into two categories, large and small, although no definition of large or small is specifically outlined. Participants are equally dispersed, with 252 reporting a large community size, and an equal number reporting a small community size, leaving us with a variance equal to 0, and a flat (platykurtic), symmetrical distribution.

Gender Participants are almost equally dispersed on gender, with 253 males (the mode) and 251 females in the study. Once again, we have a platykurtic, flat distribution with almost no variance.

JFRIESEN ASSIGNMENT #1 Year of Birth Participants reported on year of birth, which ranged across 70 years, from 1917 to 1987. The distribution of participants is fairly symmetrical, but platykurtic with more items under the tails than would be expected in a normal curve. We can easily see that our sample was not evenly distributed across year of birth. There are a number of peaks throughout the distribution, and there is a noticeably low number of participants from 1953 (5), compared to the adjacent years. The median of the data is 1951, meaning half of the participants were born prior to 1951, and half were born after 1951. Quartiles for this data fall at 1942 (1st quartile), and 1962 (3rd quartile), which tells us that 50% of participants were born between 1942 and 1962.

Month of Birth Participants reported on month of birth (5 refused to respond to the question). There is a relatively flat (platykurtic) or even distribution across months of birth, ranging from 33 (June) to 48 (March, September). March and September are both modes for this distribution.

JFRIESEN ASSIGNMENT #1

Calculated Age Calculated age is defined by age in years at the time of the study. The median of this group is 54.5, meaning that half of the participants are older than 54.5 years old, and half of the participants are younger than 54.5 years of age. The mean of the group is 53.89 years of age, and the standard deviation is 15.04 years, leaving 67% of respondents between the ages of 38.86 and 68.93 years of age, with the full range of participants being [19, 88], or 69 years. This is a mesokurtic, symmetrical distribution, which falls close to the line of the normal distribution.

JFRIESEN ASSIGNMENT #1 S3: Visits 24 months Participants were asked the following question: In the past 24 months, how many times have you seen a family doctor or nurse practitioner. Their responses were categorized into six categories: Have not seen, 1-4 times, 5-10 times, 11 or more times, Dont know, Refuse to answer. All participants answered with a defined number of visits (0 participants chose Have not seen, Dont know or Refuse to answer), and all had seen a doctor or nurse practitioner in the previous 24 months. The mode for this data set is 1-4 times, with 241 participants, or 47.8%, choosing this answer. The frequency declined as the number of doctor visits increased, ranging from 241 (1-4 times) to 115 (11+), leaving a distribution with a downward slope.

S4: Care facility Participants were asked: Do you live in a care facility, such as a nursing home or extended care facility? All 504 participants indicated that they did not, which means all participants live in the community, leave a completely flat distribution with no variance. Of note, this suggests that this data may not be truly representative of all individuals in the selected regions, as those who live in care facilities may be quantitatively different than those who do not, which means caution must be used when generalizing from this sample. (No graph or further description needed.)

JFRIESEN ASSIGNMENT #1 S5: Health Care employment Participants were asked: Are you currently employed as a health care professional? This includes nurses, doctors, paramedics, or other health care professionals that interact with patients. The mode was No with 471 participants indicating that they were not currently employed as health care professionals, leaving 33 participants responding with the affirmative.

Visit1: Doctor past 12 months Participants were asked, How many visits to your doctor have you made in the past 12 months? The range of answers was 50 [0, 50]. (Nine participants responded with either Not sure or I dont know. The data of these 9 participants was not included, as the coding used (94, 97) would have changed the accuracy of the results.) The mean response was 5.06, with a standard deviation of 6.665. The shape of the distribution is leptokurtic (high peak), and is positively skewed, meaning it is asymmetrical, with a higher concentration of scores on the left, towards the lower end of the continuum, with a longer tail to the right. A box-plot shows that there are a number of outliers on the extreme high end of the data. Due to the nature of this data, the median, which is 3, is a more accurate measure of the central tendency. Quartiles for this data

JFRIESEN ASSIGNMENT #1 fell at 1 (1st quartile) and 6 (3rd quartile), giving an interquartile range of 5. This means 50% of participants visited their doctor between 1 and 6 times, leaving only 25% who visited their doctor more than 6 times.

JFRIESEN ASSIGNMENT #1 Visit2: Walk-in past 12 months Participants were asked, How many visits to a walk-in clinic have you made in the past 12 months? The range of answers was again 50 [0.50], with 4 participants indicating they were Not sure or Didnt know. (The data for these four participants was removed prior to analysis.) The mean of this data is 1.27, with a standard deviation of 3.513. This distribution is leptokurtic and positively skewed, with the majority of scores on the extreme lower end of the continuum, with a longer tail to the right. A box plot shows there are a number of extreme outliers in the data set. Again, due to the nature of the data, the median, which is 0, is a better measure of central tendency. Quartiles fall on 0 (1st quartile) and 1 (2nd quartile), meaning 75% of the participants visited a walk-in clinic at most once in the past 12 months.

JFRIESEN ASSIGNMENT #1

Visit3: Emergency Room past 12 months Participants were asked, How many visits to an emergency room have you made in the past 12 months? The range of responses was 6 [0,6]. Five participants answered Not sure, Dont know or refused to answer. The data for these 5 participants was removed prior to analysis. Of the remaining 499 participants, there was a mean of .42 and a standard deviation of .991. The distribution is leptokurtic and is positively skewed, with the majority of scores on the extreme left (low) side of the continuum, with a few out under the tail on the right. While there are no extreme outliers, the median is an appropriate measure of central tendency, due to the positive skew of the data. For this variable, the median is 0, as are all three quartile points and the inter-quartile range. This means that at least 75% of participants did not visit an emergency room in the last 12 months.

JFRIESEN ASSIGNMENT #1

Summary While much of the data presented here was demographic in nature, the heart of this information lies in the number of visits made to health care providers. Based on the data, we can see that participants are much more likely to visit their doctor than a walk-in clinic or emergency room. We can also break that information down based on some of the demographic features, such as gender, region or whether they are employed as a health care provider. For instance, if we compare males and females on median number of visits to a doctor in the last 12 months, we see women tend to visit their doctor more than men.

JFRIESEN ASSIGNMENT #1 We could also look at emergency room visits by region, where we would find that in some regions, the average number of emergency room visits is much higher than in other regions.

Finally, if we consider the difference between health care employees and non-health care employees, we would notice that health care employees are less likely to use a walk-in clinic than those not employed in the profession.

Our descriptive statistics give us a good introduction in the nature of the data we have collected, and open the doors for further analysis of the data, in which specific research questions can ideally be investigated.

JFRIESEN ASSIGNMENT #1 Appendix A: Statistical Charts


Statistics Region N Valid Missing Mean Median Mode Std. Deviation Variance Range Minimum Maximum Percentiles 25 50 75 504 0 12.48 14.00 14 5.970 35.646 27 1 28 9.00 14.00 14.00 Community Size 504 0 1.50 1.50 1
a

Gender 504 0 1.50 1.00 1 .500 .250 1 1 2 1.00 1.00 2.00

S1. Birth Year 504 0 1951.91 1951.00 1948 15.034 226.019 70 1917 1987 1942.00 1951.00 1962.00

S1A. Birth Month 504 0 7.35 7.00 3


a

Age 504 0 53.78 54.00 57 15.084 227.524 69 19 88 44.00 54.00 64.00

.500 .250 1 1 2 1.00 1.50 2.00

9.710 94.279 97 1 98 3.00 7.00 9.00

a. Multiple modes exist. The smallest value is shown

Statistics S3. Visits Past 24 Months N Valid Missing Mean Median Mode Std. Deviation Variance Range Minimum Maximum Percentiles 25 50 75 504 0 2.75 3.00 2 .803 .645 2 2 4 2.00 3.00 3.00 S4. Care Facility 504 0 2.00 2.00 2 .000 .000 0 2 2 2.00 2.00 2.00 S5. Health Care Employment 504 0 1.93 2.00 2 .248 .061 1 1 2 2.00 2.00 2.00

JFRIESEN ASSIGNMENT #1

Statistics VISIT1. How many visits to your doctor have you made in the past 12 months? N Valid Missing Mean Median Mode Std. Deviation Variance Range Minimum Maximum Percentiles 25 50 75 495 0 5.06 3.00 1 6.665 44.422 50 0 50 1.00 3.00 6.00 Mean Median Mode Std. Deviation Variance Range Minimum Maximum Percentiles

Statistics VISIT2. How many visits to a walk-in clinic have you made in the past 12 months? N Valid Missing 500 0 1.27 .00 0 3.513 12.343 50 0 50 25 50 75 .00 .00 1.00

Statistics VISIT3. How many visits to an emergency room have you made in the past 12 months? N Valid Missing Mean Median Mode Std. Deviation Variance Range Minimum Maximum Percentiles 25 50 75 499 0 .42 .00 0 .991 .983 6 0 6 .00 .00 .00

You might also like