You are on page 1of 4

A Replication of Mazar et Al’s Cheating Studies Performed on High

School Students
Diego Olaya
Peak to Peak Charter School

The experiment aimed to attempt to replicate the results of an experiment performed by Mazar et
al. in a 2008 paper in a population of high school students. Participants were asked to complete
a mathematical task and self-report scores in exchange for a reward of candy dependent on
the score they reported. Analysis of data implies there was no cheating in either of the two
experimental conditions, a result which does not align with Mazar. Possible explanations for this
discrepancy include errors introduced from issues with scheduling around the school’s schedule
as well as the possibility that candy is not a sufficient motivator to inspire cheating from high
school students.

Introduction and Background expectations of punishment might affect levels of dishonesty.


One 2008 paper examined the attitudes of college students
The question of why people cheat is an interesting and im- towards cheating, specifically their perceptions of what con-
portant question to answer and investigate, as dishonesty has stitutes cheating, how often their peers cheat, and how likely
serious practical implications outside of a laboratory setting. it is they will be caught. The study also asked participants
Because of this clear importance, several studies focusing on their opinions on the appropriate punishment for each action.
finding conditions under which people increase or decrease The study found that students view different types of cheating
their cheating have been done. These experiments have fo- very differently, with cheating on a paper considered far more
cused on various factors such as the presence of an honor seriously than cheating on homework or on tests. More inter-
code, the likelihood of being caught, and perceptions of peers’ estingly, some students said that some behaviors should not be
behavior, among others. punished at all. The findings show that students often have dif-
A study done in 1993 by Goldstone and Chin focused on ferent perceptions of what should be considered cheating than
subjects’ dishonesty when reporting copies on the depart- their instructors (Megehee and Spake, 2008). The influence
ment copier. The experiment was meant to investigate the of peers’ behavior on the cheating of others was also explored
frequency and degree of dishonest behavior by users of a by Gino in a paper detailing the results of two experiments
university copier when they had to self-report the number of focusing on how other people’s behavior affected participants’
copies they made and pay for each one. The experimenters cheating. In the first experiment, the experimenters had a
observed the behavior of patrons for two weeks and noted confederate cheat flagrantly by finishing the given task in
the size of the copy job, who the job was charged to, and the an impossible amount of time with no consequence. If the
discrepancy between the actual and reported number of copies. confederate was identified by the participants as an in-group
They found that for small jobs, patrons were more likely to not member, participants’ cheating increased; the opposite effect
report the job at all, but they under-reported less copies. For was seen if the confederate was identified as an out-group
large jobs, it was very unlikely that a patron would fail to re- member. For the second experiment, the experimenters had
port the job, but under-reporting of copies was more frequent the confederate raise the possibility of cheating but not cheat
and more severe (Goldstone and Chin, 1993). The results of himself. In this scenario, cheating went down. The results
this study make sense when considered under the framework imply that individuals’ cheating is influenced by others (Gino,
of self-concept maintenance suggested by Mazar in a 2008 Ayal, and Ariely, 2009).
paper detailing participants behavior in a series of different Other studies have examined the effects of an honor code
experiments. Each experiment manipulated the possibility on student attitudes towards cheating behavior, as in a case
of the participants to cheat during a task, the goal being to study published by Roig focusing on the students’ attitudes
measure the magnitude of cheating as well as differences in towards cheating before and after the implementation of an
cheating behavior related to the manipulations in each test. honor code. The experimenters examined the attitudes of
The results support their theory that people will cheat only to a participants using an Attitudes Towards Cheating question-
level that allows them to keep their perceptions of themselves naire, asking them to submit their own attitudes and what
as good people (Mazar, Amir, and Ariely, 2008). they thought were the attitudes of their professors. The study
Other studies have focused on how the influence of peers or found no statistically significant difference in the attitudes
2 DIEGO OLAYA

of participants one semester prior to the honor code being told to find the two numbers per matrix that added up to ten.
implemented when compared to the semester immediately They were given five minutes, which was not enough time
afterwards. The experimenters say that it is possible they did to finish, to do so. Participants were also told they would
not give enough time for the school to promote the honor code, receive a reward of one piece of candy for each correctly
or that the lack of a difference was due to students being al- solved matrix before beginning the task. This experimental
ready aware that an honor code was coming (Roig and Marks, setup mirrors Mazar et al.’s setup. The matrix task was chosen
2006). These results appear to contradict results reported because the answers are unambiguously wrong or right, such
by Mazar where the presence of an honor code reminder is that participants’ scores are not affected by potential hindsight
reported to decrease cheating among students. The reason for bias (Mazar et al., 2008).
this discrepancy could be due to some of the reasons listed In the control condition, the participants wrote how many
above, or because the introduction of an honor code results questions they got correct on the answer sheet and brought
in a change in behavior that is not consciously recognized by both the test sheet and the answer sheet to the experimenter,
the participants. who was sitting at a table at the front of the room. The experi-
menter checked their tests and confirmed the number correct
Theoretical Propositions and Hypothesis on their answer sheet. He then gave the participants a reward
The primary question being investigated is how an honor commensurate to the number answered correct. There was no
code reminder affects students’ likelihood to be dishonest opportunity to cheat in this condition.
when reporting scores for a reward. This is a replication of In the shredder condition, participants completed the ma-
one of six experiments detailed in Mazar et al.’s 2008 paper trix task as in the control condition. The only difference is
designed to test the influence of various situational factors on that at the end of the matrix task and after the participants had
participants’ dishonesty. The introduction of an honor code is transferred the number correct onto the answer sheet, they
designed to call to mind participants’ moral values and draw shredded their test sheets and handed the answer sheet to the
their attention to their moral standard with the intention of experimenter, who gave them a reward based on the number
reducing categorization malleability, or participants’ ability they reported correct on their answer sheet. The experimenter
to re-frame dishonest actions in a more benign or honest light did not question or react to the reported score in any way
(Mazar et al., 2008). Using the framework of self-concept except to give the participant their reward.
maintenance proposed in the paper, the honor code reminder The shredder + honor code condition was meant to test the
should reduce cheating compared to a condition where there effect of a moral reminder on the students’ level of cheating.
is no honor code reminder but participants can still cheat. In this case, the moral reminder is an honor code statement.
The matrix task was the same as in the other conditions except
Method that at the top of the test sheet the following statement was
printed: "I understand that this activity falls under the Peak to
Population Peak Honor Code, and I agree to abide by its principles while
The population studied was a group of high school students completing this task." The students were instructed to print
from Peak to Peak Charter School, a small college prepara- and sign their name beneath the statement before beginning
tory K-12 located in Boulder County, CO. The students were the matrix task. Once they finished the task, they shredded
randomly assigned to test conditions and randomly selected the test and presented the experimenter with the answer sheet
as participants. The students were pulled from their Access as in the shredder condition.
classes to participate in the study. Each experimental condi- At the conclusion of the experiment and before participants
tion consisted of forty students from all grades of the high were dismissed, they were told to complete a second Google
school. Form containing a debriefing, which they were asked to sign.
After this form was complete, participants were told they
Procedure could leave.
Participants entered the room and were told to sit down at
Results
an empty desk. The table in front of them had a test sheet
and answer sheet face down on it. Participants were told to Notes on Method and Experiment Implementation
not turn the sheets of paper over. Once seated, all participants
were told to visit a Google Form containing an informed There were several complications in executing the experi-
consent outlining the task they were about to complete and ment that are worth mentioning before presenting the data, as
informing them that they were allowed to leave the experiment they affected the sample size of each condition and have the
at any point. If they did not sign, they were dismissed from potential to have affected the ultimate results.
the experiment. Each test sheet was identical and contained First of all, while 40 students from all levels of the school
twenty matrices of twelve numbers each. Participants were were selected at random to participate in experimentation,
PEAK TO PEAK CHARTER SCHOOL 3

other activities during the time of the experiment, such as SAT requisite documentation. One of the experimental conditions
preparation and grade-level specific activities, meant that not ended at the beginning of the students’ lunch period, which
everyone in the original sample showed up to participate. In meant that some of them left immediately after receiving
some cases, none of the students selected from specific rooms their reward and before receiving instructions to complete the
came, dramatically reducing sample size in some cases. debriefing.
Another complication that affected sample size was timing Efforts were made to contact participants afterwards to
difficulties and delays in the testing schedule caused by par- request that they fill out documentation. The participants con-
ticipants not arriving on time, often trickling in over a period tacted were identified to be missing either informed consent
of five minutes after the scheduled start of the experiment. or debriefing by comparing the lists of names on the responses
This meant that some participants heard portions of the script and identifying individuals that had not submitted both pieces
repeated multiple times. This staggered entry of participants of documentation. Even after contacting them, however, some
led to a somewhat hectic general atmosphere due to the delays participants did not complete the forms.
and repetition of instructions, which could have certainly
affected the results. Another consequence of the staggered Data and Analysis
entry and associated delays were a mixing of the subjects
selected for the different experimental conditions, affecting The data collected from the three conditions showed no
the sizes of each. Table 1 shows the sample sizes for each statistical evidence of cheating in any condition. The program
experimental condition. R-Studio was used for data analysis. Table 2 shows a summary
of important quantities for each experimental condition.
Condition Sample Size
Control Shredder Honor Code
Control 18
Shredder 22 Mean 5.22 4.55 5.50
Honor Code 12 Std. Deviation 2.69 3.22 2.68
Std. Error 0.634 0.686 0.774
Table 1: Experimental Condition Sample Sizes
Table 2: Summary of Data
Lastly, the honor code condition had to be completed at
a time separate from the other two conditions, meaning that A 95% confidence interval test using these data showed that
the sample for the honor code condition was selected from there was no statistically significant difference between the
volunteers from the available study halls and teaching assis- conditions with respect to the number of questions answered
tants during the time of experimentation. This sample, while correctly. This lack of a difference implies that there was no
it did include members from various grade levels, was not cheating during the experiment, or at least at levels so low
randomly selected. as to be undetectable. A graphical representation illustrating
the lack of a statistically significant difference between the
Difficulties with Informed Consent and Debriefing two groups is shown in figure 1. The error bars have a size
of ± twice the standard error. Performing an unpaired t-test
While the majority of subjects complied with experimental
procedure and completed an informed consent and debrief-
ing online as instructed, a few participants either forgot or
neglected to complete this paperwork at the links provided
when instructed to do so. Because the informed consent and
debriefings were to be done online, there was no easy way
to verify if everyone had completed the debriefing until after
the trial was complete. Normally, the course of action would
be to not use the data of the people who failed to sign the
paperwork, but due to the anonymous nature of the data, it
was not possible to do so.
Adding to the difficulties of ensuring that everyone submit-
ted the informed consent and the debriefing was the tendency
of participants to occasionally arrive late, forcing the experi-
menter to repeat the instruction to fill out the informed consent. Figure 1: Mean Scores by Test Condition
If the participants felt a time pressure to catch up to the rest
of the group quickly, they may have neglected to fill out the between the control condition and the other experimental con-
4 DIEGO OLAYA

ditions confirms the lack of a significant difference implied compromised content validity. This issue, at least, has an
on the graph in figure 1. easy solution; future replication studies with more time to
experiment and larger sample sizes for each condition.
Discussion As mentioned in the introduction, studies on why individu-
The results from Mazar et al.’s paper note two main find- als cheat have great practical potential, and making sure that
ings. First, that there was cheating in both the shredder condi- results from previous papers can be replicated is an important
tion and, to a lesser extent, the honor code condition. Second, part of the scientific process. While the results of this paper
that the level of cheating in the honor code condition was do not agree with those published in Mazar’s work, it is likely
significantly lower than the level of cheating in the shredder reasonable to assume that the concerns with execution men-
condition. Neither result was observed in this experiment; tioned in this paper make this experiment a poor gauge by
there was no evidence of cheating by any participant. which to assess the validity of Mazar’s results. Further work
There are a few key differences in the experimental setup would involve a refinement of the procedure for experimental
between Mazar and this paper. The first is that, due to budget sessions and a longer experimental period to help minimize
constraints, the reward for a correct answer was not money errors introduced due to time pressure. Lastly, it would be
but candy. While this might certainly motivate the students advisable to seek sufficient funds to make the participants’
to do well, it is very possible that candy was not a sufficient reward money, as in Mazar, to ensure the highest possible
motivator to get the students to consider cheating on the task. degree of similarity between the two experiments.
A second difference is that many of the students knew the
experimenter by name if not personally. This connection to
the experimenter could change the behavior of participants References
by making cheating feel more like a betrayal than if the ex-
perimenter was a complete stranger. Another possibility is Gino, F., Ayal, S., & Ariely, D. (2009). Contagion and differ-
that the experiment was conducted with school authority fig- entiation in unethical behavior: The effect of one bad
ures present. This could not be avoided, as the space was apple on the barrel. Psychological Science, 20, 393–
where many of these figures had their offices. However, their 398.
presence could have certainly dissuaded the students from Goldstone, R. L. & Chin, C. (1993). Dishonesty in self-report
cheating. of copies made: Moral relativity and the copy machine.
A possible concern affecting test-retest reliability is men- Basic and Applied Social Psychology, 14, 19–32.
tioned in "Notes on Method and Experiment Implementation" Mazar, N., Amir, O., & Ariely, D. (2008, December). The
when discussing the slightly hectic nature of the trials. Be- dishonesty of honest people: A theory of self-concept
cause of these complications, it is very possible that the ex- maintenance. Journal of Marketing Research, 45, 633–
perimental procedure varied between trials in uncontrolled 644.
ways, leading to different experiences in different conditions. Megehee, C. M. & Spake, D. F. (2008). The impact of per-
This complication is compounded by the fact that only one cieved peer behavior, probable detection, and punish-
session of each experimental condition was run due to time ment severity on student cheating behavior. Marketing
constraints. Education Review, 18, 5–19.
It is unclear whether the results of this experiment can R Core Team. (2017). R: A language and environment for
be generalized to all high school students or to students in statistical computing. R Foundation for Statistical Com-
general and thus have external validity. Because of the fact puting. Vienna, Austria. Retrieved from https://www.R-
that many of the participants knew the experimenter, their project.org/
behavior could have been altered in ways that are not gen- Roig, M. & Marks, A. (2006). Attitudes toward cheating be-
eralizable to a broader population. Another concern has to fore and after the implementation of a modified honor
do with the possibility that the small sample size may have code: A case study. Ethics and Behavior, 16, 163–171.

You might also like