Professional Documents
Culture Documents
ABSTRACT
Method
The investigators used the following methods to answer the
question: What are the reliability and internal consistency of
data from the C-SEI when it is used by a nation-wide sample of
baccalaureate nurse educators to evaluate video-archived simulated clinical experiences?
583
RESEARCH BRIEFS
Instrument
Video-recordings are frequently used in clinical and educational research to provide standardized subjects for observationbased evaluations (Baer, Smith, Rowe, & Masterton, 2003;
Cusick, Vasquez, Knowles, & Wallen, 2005; McConvey &
Bennett, 2005; Portney & Watkins, 2008). Indeed, one of the
primary barriers to analyzing the reliability of data from simulation evaluation instruments (or any instrument designed for
making observation-based evaluations) is the void of standardized subjects to evaluate. To overcome this barrier, the investigators produced and video-archived three simulated patient care
scenarios depicting nursing students performing patient care
below, at, and above the level of expectation for senior baccalaureate nursing students. These scenarios were uploaded onto
584
Another barrier to rigorous psychometric analyses of simulation evaluation instruments is difficulty accessing an adequate
number of qualified participants. The investigators established
a virtual classroom and used webinar communication to facilitate the participation of nurse educators from around the United
States. The virtual classroom served as a portal for information
related to the study, including reference and training materials.
Prior to participating in the study, individuals attended webinar
orientation meetings where the investigators familiarized them
with the C-SEI. Participants then accessed a training video embedded in the online classroom where they were taught how to
score a sample scenario using the C-SEI. One of the unique features of the C-SEI is the standardized training that is required
prior to using the instrument. This training was adapted for the
scenario that was used in the study and included expected minimal behaviors for the simulation.
Data Collection Procedures
Results
Twenty-nine (76.31%) of the original 38 participants completed the entire 6-week data collection procedures. Using
Walter, Eliasziw, and Donners (1998) equation for determining sample size for reliability studies, this sample size provided
adequate power (80% and a = 0.05) when the minimal acceptable levels of reliability were 0.40. However, the width of the
confidence interval for the intrarater reliability result demonstrated that additional scenarios or a larger number of participants would have been better suited for determining test-retest
reliability. Data from the evaluation instruments were entered
into SPSS version 17.0 software for analysis. In addition to the
raw scores assigned by each of the participants, scores were
converted into percentages to categorize them as passing or
not passing. Scores of 75% or greater were designated as
passing and scores less than 75% were designated as not passing. Analysis of variance (ANOVA) comparisons revealed that
scores assigned by the participants who completed the study
were not significantly different from the scores assigned by parCopyright SLACK Incorporated
RESEARCH BRIEFS
ticipants who did not complete the study. The following sections
present the descriptive statistics and the reliability and internal
consistency findings using data from only the participants who
completed the study (n = 29).
Descriptive Statistics
The descriptive statistics, including means and standard deviations of scores, assigned to each of the scenarios during the
first and second viewings (during the first half of the data collection procedures, weeks one through three) and the second
half of the data collection procedures (weeks four through six)
provided validity evidence supporting the intended levels of the
scenarios. The mean scores and standard deviations for the below-the-level-of-expectation scenario were 4.24 (3.00) and 4.62
(3.56) for the first and second viewings, respectively. Likewise,
the mean scores and standard deviations for the at-the-level-ofexpectation scenario were 14.43 (3.79) and 14.39 (4.66) for the
first and second viewings, respectively. The mean scores and
standard deviations for the above-the-level-of-expectation scenario were 19.14 (1.48) and 19.28 (1.51) for the first and second
viewings, respectively. ANOVA comparisons revealed significant differences between each of the levels.
When the scores were categorized as passing or not passing,
all of the raters assigned the students at the below-the-level-ofexpectation scenario a score of not passing and the students at
the above-the-level-of-expectation scenario a score of passing
during both the first and second viewings. However, when the
scores for the at-the-level-of-expectation scenario were categorized in this way, 16 (55%) of the 29 raters assigned a score
of passing during the first viewing, whereas 18 (62%) of the
29 raters assigned a score of passing during the second viewing. ANOVA revealed that the differences between raw scores
assigned during the first and second viewings were not statistically significant.
Interrater Reliability
RESEARCH BRIEFS
present further limitations of the study and analyses. The intraclass correlation (2,1) has, as one of its assumptions, a random
sample of raters. However, the sample used for this study was a
convenience sample of nursing faculty who met specific inclusion criteria. Finally, the width of 95% confidence intervals for
the reliability result were influenced by the number of raters and
the number of scenarios that were used. Had the investigators
used more scenarios or a larger number of participants, the sizes
of these confidence intervals could have been reduced.
Conclusion
Ongoing, rigorous psychometric assessment of the C-SEI in
different settings with larger and more diverse samples will allow for further refinement of the instrument. The development
and use of additional scenarios would also improve the quality
of the intrarater reliability data and further delineation of midrange performance. Ultimately, comparison of student performance and learning between groups who are exposed to various
teaching strategies, including HPS, needs ongoing examination.
The testing of these types of instruments allows for refinement
and improved accuracy in evaluating student performance and
more effective feedback to students.
References
American Association of Colleges of Nursing. (1998). The essentials of
baccalaureate education for professional nursing practice. Washington,
DC: Author.
Baer, G.D., Smith, M.T, Rowe, P.J., & Masterton, L. (2003). Establishing the reliability of mobility milestones as an outcome measure for
stroke. Archives of Physical Medicine and Rehabilitation, 84, 977-981.
doi:10.1016/S0003-9993(03)00050-9
Cusick, A., Vasquez, M., Knowles, L., & Wallen, M. (2005). Effect of rater
training on reliability of Melbourne Assessment of Unilateral Upper
Limb Function scores. Developmental Medicine and Child Neurology,
47, 39-45.
586
Diekelmann, N.L., & Ironside, P.M. (2002). Developing a science of nursing education: Innovation with research. Journal of Nursing Education,
41, 379-380.
Herm, S., Scott, K., & Copley, D. (2007). Simsational revelations.
Clinical Simulation in Nursing Education, 3, e25-e30. doi:10.1016/
j.ecns.2009.05.036
Kardong-Edgren, S., Adamson, K., & Fitzgerald, C. (2010). A review of
currently published evaluation instruments for human patient simulation. Clinical Simulation in Nursing, 6(1), e25-e35. doi:10.1016/
j.ecns.2009.08.004
McConvey, J., & Bennett, S.E. (2005). Reliability of the Dynamic Gait Index in individuals with multiple sclerosis. Archives of Physical Medicine and Rehabilitation, 86, 130-133. doi:10.1016/j.apmr.2003.11.033
National League for Nursing. (2005). Nursing education research. Retrieved from http://www.nln.org/research/nln_laerdal/
Oermann, M.H. (2009). Evidence-based programs and teaching/evaluation
methods: Needed to achieve excellence in nursing education. In M. Adams & T. Valiga (Eds.), Achieving excellence in nursing education (pp.
63-76). New York, NY: National League for Nursing.
Oermann, M.H., Yarbough, S.S., Saewert, K.J., Ard, N., & Charasika, M.E.
(2009). Clinical evaluation and grading practices in schools of nursing:
National survey findings, part II. Nursing Education Perspectives, 30,
352-357.
Portney, L.G., & Watkins, M.P. (2008). Foundations of clinical research:
Applications to practice (3rd ed.). Upper Saddle River, NJ: PrenticeHall.
Shrout, P.E., & Fleiss, J. (1979). Intraclass correlations: Uses in assessing
rater reliability. Psychological Bulletin, 86, 420-428.
Todd, M., Manz, J.A., Hawkins, K.S., Parsons, M.E., & Hercinger, M.
(2008). The development of a quantitative evaluation tool for simulations in nursing education. International Journal of Nursing Education
Scholarship, 5, Article 41.
Walter, S.D., Eliasziw, M., & Donner, A. (1998). Sample size and optimal designs for reliability studies. Statistics in Medicine, 17, 101-110.
doi:10.1002/(SICI)1097-0258(19980115)17:1<101::AID-SIM727>
3.0.CO;2-E
Yule, S., Rowley, D., Flin, R., Maran, N., Youngson, G., Duncan, J., et al.
(2009). Experience matters: Comparing novice and expert ratings of
non-technical skills using the NOTSS system. Surgical Education, 79,
154-160.
Reproduced with permission of the copyright owner. Further reproduction prohibited without
permission.