You are on page 1of 9

Cervical Spine Injury Severity Score Assessment of Reliability Paul A. Anderson, MD; Timothy A. Moore, MD; Kirkland W.

Davis, MD; Robert W. Molinari, MD; Daniel K. Resnick, MD; Alexander R. Vaccaro, MD; Christopher M. Bono, MD; John R. DimarII, MD; Bizhan Aarabi, MD, FRCSC; Glen Leverson, PhD Background: Systems for classifying cervical spine injury most commonly use mechanistic or morphologic terms and do not quantify the degree of stability. Along with neurologic function, stability is a major determinant of treatment and prognosis. The goal of our study was to investigate the reliability of a method of quantifying the stability of subaxial (C3-C7) cervical spine injuries. Methods: A quantitative system was developed in which an analog score of 0 to 5 points is assigned, on the basis of fracture displacement and severity of ligamentous injury, to each of four spinal columns (anterior, posterior, right pillar, and left pillar). The total possible score thus ranges from 0 to 20 points. Fifteen examiners assigned scores after reviewing the plain radiographs and computed tomography images of thirty-four consecutive patients with cervical spine injuries. The scores were then evaluated for interobserver and intraobserver reliability with use of intraclass correlation coefficients. Cervical Spine Injury Severity Score The CSISS was developed by the senior author (P.A.A.) and is based on consideration of both ligamentous and osseous injury and the distribution of the injury across various portions of the spinal column. In this system, the cervical spine is divided into four columns: anterior, posterior, right pillar (right lateral column), and left pillar (left lateral column) (Fig. 1-A). The anterior column includes the anterior and posterior longitudinal ligaments, vertebral body, intervertebral disc, uncinate processes, and transverse process. The posterior column includes both laminae, the spinous process, the posterior ligamentous complex, and the ligamentum flavum. The lateral columns or pillars include the pedicle, lateral masses, superior and inferior articular processes, transverse processes, and facet capsules. Each column is scored on an analog scale ranging from 0 to 5 points (Fig. 1-B), with higher values given for more severe injuries as judged on the basis of bone and ligamentous disruption. Ligamentous disruption is presumed on the basis of separation of normal osseous landmarks. If more than one cervical spine level is injured, only the most severely affected level is graded. A score of 0 points signifies no injury, a score of 1 indicates a nondisplaced fracture, and a score of 5 points is given for the worst possible injury to that particular column, such as a complete fracture-dislocation of a facet (pillar) or a complete disruption of the posterior ligamentous complex as evidenced by wide separation of the spinous processes. Fractional values can be used, and the examiner is free to either upgrade or downgrade the score as he or she deems appropriate, such as for patients with severe degenerative ankylosis, diffuse idiopathic spinal hyperostosis, or ankylosing spondylitis that substantially alters the rigidity of the spine. In the present study, no specific instructions were given with regard to the amount of modification (upgrading or downgrading) that the examiners should employ.

The overall injury severity score is the sum of the analog scores for the four columns, with the total ranging from 0 to 20 points.

Patient Cohort Thirty-four consecutive patients with subaxial cervical spine injuries were included in our study. The treatment of the injury was not affected by the results of the study. Institutional review board approval was obtained. The patients' informed consent was not required for the use of the deidentified images. Image Acquisition, Storage, and Viewing Lateral digital radiographs and computed tomography scans of the cervical spine were made for all patients at the time of presentation. The computed tomography scans were all acquired with multidetector scanners in helical mode, with 1.25-mm images and 0.6-mm overlap. These source images were then reformatted into two-dimensional true axial, sagittal, and coronal images, with a reformatted slice thickness of either 2 or 3 mm. The de-identified lateral radiographs and two-dimensional computed tomography reformatted images were then stored in compact-disc read-only format. The compact discs could each accommodate images of five patients. The images of the thirty-four patients were stored in random order, with the images of five patients stored twice to measure intraobserver variability. The viewing software was eFilm Lite (Merge eFilm, Milwaukee, Wisconsin), which allows images to be viewed on a standard computer in much the same fashion as images are viewed at a standard radiographic picture archiving and communication system (PACS) workstation. The injury images were evaluated by fifteen different reviewers, who included orthopaedic and neurosurgical residents and fellows, attending surgeons experienced in the treatment of spinal trauma, and a musculoskeletal radiologist. The reviewers were given instructions on the use of the viewing software and general instructions for utilizing the stability scoring

system. Each reviewer then scored each case in random order without knowledge of any clinical or treatment data. Statistical Methods Intraobserver and interobserver reliabilities were determined with the intraclass correlation coefficient, which ranges from 0 (no agreement) to 1 (perfect agreement). The intraclass correlation coefficient is a ratio of the variability among subjects to the overall variability observed in the data. It is commonly used to assess reliability of ordinal data that have many classes or to assess continuous data. The interobserver intraclass correlation coefficient was calculated by comparing results of each case between each examiner and averaging. The intraobserver intraclass correlation coefficient was calculated by comparing results between repeat measurements for the five randomly repeated cases. An intraclass correlation coefficient score of 0 to 0.4 indicates poor reliability, a score of >0.4 to 0.75 indicates fair or moderate reliability, and a score of >0.75 indicates excellent reliability7. Prior to institution of the study, a power analysis performed with a minimum of ten examiners and thirty-four patients predicted that we would have >85% power to show excellent reliability if it was present. Increasing power was noted with an increasing number of examiners8. A p value of 0.05 was considered to be significant. All analyses were performed with use of SAS software (version 9; SAS Institute, Cary, North Carolina).

Figs. 3-A through 3-D Case 28, a patient with a flexion-axial loading injury. The overall CSISS assigned by the senior author was 10 points, and the patient was treated surgically. Fig. 3-A Midsagittal computed tomography scan showing traumatic retrolisthesis of the body into the canal and no separation of the spinous processes. The anterior column was given a score of 4 points. Fig. 3-B Axial computed tomography scan demonstrating bilateral laminar fractures. The posterior column was given a score of 1 point.

Results: The mean intraobserver and interobserver intraclass correlation coefficients for the fifteen reviewers were 0.977 and 0.883, respectively. Association between the scores and clinical data was also excellent, as all patients who had a score of =7 points had surgery. Similarly, eleven of the fourteen patients with a score of =7 points had a neurologic deficit compared with only three of the twenty with a score of <7 points. Case Distribution The CSISS scores were widely distributed from 0 to 20 points. The distribution was not normal, as there was a concentration of cases toward the less severe ratings and a spike of cases at the most severe rating (20 points). Figure 2 is a frequency plot based on divisions of 0.5 units, such as 7.0 to 7.4 or 9.5 to 9.9. In general, the low values represented nondisplaced or minimally displaced fractures, whereas the high scores were for bilateral facet dislocations with >5 mm of displacement and injury to each of the four columns. Intraobserver Error The intraobserver error was assessed by comparing each examiner's scores for two separate evaluations of five randomly selected patients. The mean intraclass correlation coefficient for the fifteen examiners was 0.977 (range, 0.948 to 1.000), indicating excellent reliability. Interobserver Error The mean interobserver intraclass correlation coefficient for the CSISS was 0.883, indicating excellent reliability (Table I). The mean intraclass correlation coefficient for anterior, posterior, and combined pillars was 0.818, 0.759, and 0.831 respectively. These high intraclass correlation coefficient scores also indicate excellent reliability. TABLE I Interobserver Intraclass Correlation Coefficients for Thirty-four Cases Assessed by Fifteen Examiners Intraclass Correlation Coefficient Total (four columns) 0.883 Anterior column 0.818 Posterior column 0.759 Combined right and left pillars 0.831 Right pillar 0.789 Left pillar 0.739

Effect of Experience of Examiner We noted no differences based on the experience of the examiners, with an intraclass correlation coefficient of 0.871 for residents and fellows compared with 0.894 for the attending surgeons and the radiologist.

Effect of Fracture Type The mean differences in the CSISS scores among the examiners are given for six different fracture patterns in Table II. These differences ranged from 1.64 for isolated fractures to 3.06 for fractures in a spine with ankylosing spondylitis. TABLE II Mean Differences Among Fifteen Examiners' CSISS for Six Fracture Patterns No. Mean Difference in Score and Stand. Dev. (points) Fracture separation of lateral mass 7 2.14 0.73 Flexion axial loading/burst fracture 6 2.18 0.66 Bilateral facet dislocation 5 1.70 1.31 Isolated fractures 8 1.64 0.74 Unilateral facet fracture- dislocation 4 2.03 0.56 Fracture in ankylosed spine 2 3.06 0.02

Association with Treatment and Neurologic Injury The CSISS scores were compared with neurologic function and with treatment. All fourteen patients with a mean CSISS score of =7 points were treated surgically (see Appendix), and only four of the twenty patients who had surgery had a score of <7 points. Three of those four patients presented with a neurologic deficit. One of them had a traumatic disc herniation that had produced a severe central cord syndrome. This disc herniation was two levels cephalad to a subtle disc distraction injury, and surgical intervention was undertaken to decompress the spinal cord at the level cephalad to the disc distraction injury. The other two patients with a neurologic deficit had a unilateral superior articular facet fracture with fracture displacement into the neural foramen causing radicular symptoms. Both had a unilateral superior articular facet fracture with fracture displacement into the neural foramen causing radicular symptoms. The presence of a neurologic deficit had a strong influence on the decision to treat a patient surgically. All fourteen patients with a neurologic deficit were treated surgically. Of the patients who were neurologically intact, all who had a score of =7 points were treated with surgery and only one of seventeen who had a score of <7 points was treated surgically. Although the CSISS accurately predicted the need for surgery, all treatment decisions were made independently at the time of injury irrespective of the calculation of that score.

Distribution of mild to severe injuries. This histogram shows the number of total observations by all examiners by 0.5-point increments. There is an excellent distribution across all stability patterns.

Worst-Case Analysis
The cases that led to the highest variability among the examiners' scores (the seven most severe cases) were analyzed separately to determine potential weaknesses of the scoring system (Table III). On retrospective review of the computed tomography scans of two flexion-axial loading injuries, it was determined that fractures or subluxation of the facets that were clearly demonstrated by those scans had been missed or underscored (Figs. 3-A and 3-B, 3-C and 3-D). The other five cases that were associated with variability in scoring illustrate limitations of computed tomography imaging and the analog scoring system. These cases included major ligamentous and disc disruptions that were often underestimated by the reviewers. For example, one injury was in an ankylosed spine, and the other was an injury through the disc space (Figs. 4-A and 4-B, 4-C and 4-D). In the latter case, distraction of the disc space, anterior subluxation, and ankylosis from spondylosis were seen on the sagittal computed tomography reformation (Fig. 4-A). Despite the variability, all reviewers assigned a score of at least 8 points to this case, and the intraclass correlation coefficients were again high for interobserver reliability.
TABLE III Analysis of Cases Leading to Greatest Variation in CSISS Among Examiners Mean CSISS Case Diagnosis Factors Contributing to Variation (points) Flexion axial 30 17.6 Low scores for facet subluxation of >5 mm loading Low scores for facet subluxation of >5 mm; Flexion axial 28 13.5 variability in grading of posterior ligamentous loading complex

25 29 27 26 14

Bilateral facet dislocation Fracture in ankylosed spine Unilateral facet dislocation Fracture in ankylosed spine Disc distraction

11.9 16.1 12.7 12.4 4.3

Variation in grading of anterior column with >5 mm displacement Lower scores based on displacement and not taking known instability into account Differences in rating of posterior injury Lower scores based on displacement and not taking known instability into account Subtle finding of disc space distraction

Discussion: Determining the severity or stability of a cervical spine injury has proved difficult despite the use of high-resolution imaging. Stability has been considered to be a binary variable (stable or unstable) when in fact it is likely to be a continuum of injury patterns that can appear similar. For instance, a burst fracture of C7 with 5 mm of retropulsion of bone into the spinal canal may be associated with no other injury or with a concurrent complete posterior osteoligamentous injury9. We therefore developed our classification system to allow measurement of all components of the spinal column and not just the most obvious injury region. The concept of functional stability was probably first described by Nicoll, who defined it in a population of Welsh miners on the basis of their ability to return to work3. Subsequently, in 1970, Holdsworth described the force vectors of an injury and the importance of an intact posterior osteoligamentous complex in maintaining the stability of the injured motion segment4. However, we believe that the criteria for determining the stability of the cervical spine have not been clearly delineated. Here we have proposed a system to assess stability of the cervical spine on the basis of the concept that increasing amounts of skeletal displacement or osseous separation correlate with the amounts of ligamentous disruption and instability. The CSISS is determined by evaluating four distinct anatomic columns, with the injury to each column scored independently. By adding the scores for the four columns, the amount of overall stability is determined on the basis of the assumption that combined column injuries will lead to greater instability. This model is consistent with biomechanical studies demonstrating increasing neutral zone motion with increasing severity of injury9. Also, it is not dissimilar to the basic principles of the qualitative schemes described by Allen et al.2 or the clinical stability checklist described by White and Panjabi6. Our four-column model of the spine is a modification of the model described by Louis 1. Louis, who applied his system to C3 through L5, proposed that the spine is supported by three pillars, with the anterior pillar consisting of the vertebral body and disc and the two posterior pillars formed by the articular processes, reinforced horizontally by the lamina and pedicles. We utilized four columns in our system by adding the posterior osteoligamentous complex, which has been shown to be of paramount importance to stability4,10. The CSISS demonstrated excellent intraobserver and interobserver reliability. We believe that this is due to the division of the spine into four columns and the critical assessment of each. Scoring was simplified by the use of an analog system. Greater variation in the scores was introduced primarily by the examiner's lack of appreciation of subtle signs of substantial ligamentous disruption, such as facet joint diastasis, disc distraction, an ankylosed spine, or

interspinous widening. Because major ligamentous or disc derangements were underestimated in some of our patients, the variability in the scores may have been reduced by the addition of magnetic resonance images. The CSISS performed well for all fracture types and for relatively minor to grossly unstable injuries. The fracture type associated with the greatest variation in scores was a fracture in an ankylosed spine. The kappa statistic is the most commonly used measure of agreement in an analysis of categorical data. When there are a large number of classes (choices), there is less of a chance that two raters will choose the same category, which results in lower kappa values. A weighted kappa has been developed that adjusts for the extent of the disagreement by assigning weights based on the distance between the disagreement cells. When the categories are equally spaced along one dimension, the weighted kappa is equivalent to the intraclass correlation coefficient11. The intraclass correlation coefficient is thus useful for measuring observer reliability in the presence of ordinal data for which there are many categories or when the data are continuous, as they were in this study. One must keep in mind that heterogeneous populations, for which there are wide ranges of scores, often have elevated intraclass correlation coefficients as a result of large between-subject variances. However, the intraclass correlation coefficient is still mainly influenced by the amount of variability among raters compared with the overall variability in the data and is therefore a good representation of observer reliability. As a preliminary step to understanding the clinical utility of the CSISS, we examined the association between the scores and whether surgery had been performed and the association between the scores and the presence of a neurologic deficit. A clinically useful tool for assessing stability should be correlated with these two factors. We found that all of the patients with a CSISS score of =7 points were treated surgically. Similarly, eleven of the fourteen patients with a score of =7 points had a neurologic deficit compared with only three of the twenty with a score of <7 points.

Conclusions: The Cervical Spine Injury Severity Score had excellent intraobserver and interobserver reliability. We believe that quantifying stability on the basis of fracture morphology will allow surgeons to better characterize these injuries and ultimately lead to the development of treatment algorithms that can be tested in clinical trials.

Reference Paul A. Anderson, MD; Timothy A. Moore, MD; Kirkland W., et al. 2007. Cervical Spine Injury Severity Score: Assessment of Reliability. The Journal of Bone and Jointb Surgery. 01 may 2007.

You might also like