You are on page 1of 8

Systematic review

Reliability and validity of non-radiographic methods of thoracic


kyphosis measurement: A systematic review
Eva Barrett
a,
*
, Karen McCreesh
a
, Jeremy Lewis
b, c
a
Department of Clinical Therapies, Faculty of Education and Health Sciences, University of Limerick, Limerick, Ireland
b
Musculoskeletal Services, Health at the Stowe, Central London Community Healthcare, NHS Trust, 260 Harrow Road, London W2 5ES, UK
c
Department of Allied Health Professions and Midwifery, School of Health and Social Work, Wright Building, College Lane Campus,
University of Hertfordshire, Hateld, AL10 9AB, Hertfordshire, UK
a r t i c l e i n f o
Article history:
Received 7 January 2013
Received in revised form
29 April 2013
Accepted 16 September 2013
Keywords:
Reliability
Validity
Thoracic kyphosis
Measurement
a b s t r a c t
Background: A wide array of instruments are available for non-invasive thoracic kyphosis measurement.
Guidelines for selecting outcome measures for use in clinical and research practice recommend that
properties such as validity and reliability are considered. This systematic review reports on the reliability
and validity of non-invasive methods for measuring thoracic kyphosis.
Methods: A systematic search of 11 electronic databases located studies assessing reliability and/or
validity of non-invasive thoracic kyphosis measurement techniques. Two independent reviewers used a
critical appraisal tool to assess the quality of retrieved studies. Data was extracted by the primary
reviewer. The results were synthesized qualitatively using a level of evidence approach.
Results: 27 studies satised the eligibility criteria and were included in the review. The reliability, validity
and both reliability and validity were investigated by sixteen, two and nine studies respectively. 17/27
studies were deemed to be of high quality. In total, 15 methods of thoracic kyphosis were evaluated in
retrieved studies. All investigated methods showed high (ICC .7) to very high (ICC .9) levels of
reliability. The validity of the methods ranged from low to very high.
Conclusion: The strongest levels of evidence for reliability exists in support of the Debrunner kyph-
ometer, Spinal Mouse and Flexicurve index, and for validity supports the arcometer and Flexicurve index.
Further reliability and validity studies are required to strengthen the level of evidence for the remaining
methods of measurement. This should be addressed by future research.
2013 Elsevier Ltd. All rights reserved.
1. Introduction
Thoracic kyphosis is the sagittal plane curvature betweenT1 and
T12 vertebral bodies (Perriman et al., 2010). Normal kyphosis ranges
from 20 to 50

when assessed radiographically (Willlner, 1981).


Excessive thoracic kyphosis, dened as a kyphosis >50

(Willner,
1981; Teixeira and Carvalho, 2007), has been previously linked
with a range of negative consequences. The postural effects of
excessive kyphosis include musculoskeletal complaints such as
shoulder pain (Gray and Grimsby, 2004) and cervical pain (Horter,
1978; Callet, 1991; Ayub, 1991) and can affect any age group (Gray
and Grimsby, 2004). In osteoporotic samples, excessive thoracic
kyphosis can lead to physiological adaptations such as impaired
respiratory function (Murrayet al., 1993; Di Bari et al., 2004) and can
have functional inuences such as decreased mobility (Lydick et al.,
1997), injurious falls (Kado et al., 2007) and loss of independence
(Lydick et al., 1997). The measurement of thoracic kyphosis is
therefore an essential aspect to musculoskeletal assessment, help-
ingclinicians toadequatelyscreenfor excessive kyphosis, determine
baseline data, monitor progress and guide appropriate imple-
mentation of treatment strategies (Chaise et al., 2011).
The current gold standard for the quantication of thoracic
kyphosis is the lateral radiograph, a method which provides a Cobb
angle (Harrison et al., 2001; Briggs et al., 2007). While this is
routinely usedfor thediagnosis andmonitoringof conditions suchas
idiopathic scoliosis and hyperkyphosis (Saad et al., 2012), it has
signicant limitations. Radiographic methods are generally incon-
venient ina clinical setting, involve highcosts andexpose the patient
tohighdoses of potentiallyharmful radiation(Korovessis et al., 2001;
Kellis et al., 2008). Furthermore, the validity of the Cobb angle has
been criticized, particularly in osteoporotic individuals, as it pre-
dominantly reects endplate tilt of vertebrae between selected
limits of the curve and fails to represent the full contour of the
thoracic spine (Goh et al., 1999; Harrison et al., 2001; Briggs et al.,
2007).
* Corresponding author. Tel.: 353 (0)61 234232; fax: 353 (0)61 234251.
E-mail addresses: evabarrett@live.ie, Eva.Barrett@ul.ie (E. Barrett).
Contents lists available at ScienceDirect
Manual Therapy
j ournal homepage: www. el sevi er. com/ mat h
1356-689X/$ e see front matter 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.math.2013.09.003
Manual Therapy 19 (2014) 10e17
Alternatively, several non-invasive, skin-surface methods
have been adopted for clinical use including the Debrunner
kyphometer (hln et al., 1989), the Flexicurve (Milne and
Williamson, 1983), the Spinal Mouse (Mannion et al., 2004) as
well as technology based methods including rasterstereography
(Melvin et al., 2010) and 3D ultrasound (Folsch et al., 2012).
Guidelines for selecting measurement tools for use in clinical and
research practice recommend that validity and reliability are
amongst the essential properties to be considered (Lohr, 2002;
Terwee et al., 2007). Validity is an evaluation of whether an in-
strument measures a construct or variable that it is intended to
measure (Carmines and Zeller, 1979; van de Ven-Stevens et al.,
2009). For a non-invasive tool to be considered accurate enough
to measure thoracic kyphosis in practice and research, it must
display adequate criterion validity when compared to the gold
standard, i.e. the radiographic Cobb angle. Reliability is dened as
the extent to which a measurement is consistent and free from
error, when used by the same rater (intra-rater reliability), or
when used by different raters (inter-rater reliability) (Portney
and Watkins, 2000). In practice, to state that a patients clinical
status has changed since the last measurement, the measured
change is required to be larger than the error associated with the
measurement (Wright and Feinstein, 1992). Therefore, the
reporting of Standard Error of Measurement is an important
element of reliability studies as it aids clinical interpretability of
results (van de Ven-Stevens et al., 2009).
Since numerous studies on the psychometric properties of these
instruments have been published, an evaluation of the literature is
required. Therefore, the purpose of this systematic review is to
report on the reliability and validity of methods of non-invasive
thoracic kyphosis measurement.
2. Methods
2.1. Search strategy
A systematic search was performed on 1st October 2012 by the
primary investigator. Searches of the following databases were
performed: MEDLINE, AMED, CINAHL, Pubmed, Biomedical Refer-
ence Collection: Expanded, SportDiscus, ScienceDirect, Cochrane
Library, Web of Science (1960eOct 2012). The search was con-
ducted using search terms from 3 subject areas: thoracic kyphosis
(thoracic kyphosis, spinal curvature, thoracic curvature,
kyphosis), psychometric properties (reliability, validity, sensitivity,
responsiveness, properties) and physical tests (instrument, tool,
test, measure*, inclinometer, exicurve, kyphometer, radiograph,
Cobb). The Boolean Operators Or and And were used to
combine the search terms within and between each of the 3 subject
areas respectively. A word from each area was required to be in the
Title or Abstract of the study. An additional search of Google Scholar
search engine was also performed. These searches were supple-
mented by hand-searching the reference lists of the nal articles
found from the above searches.
2.1.1. Eligibility criteria
A meeting between the two reviewers was convened to decide
on selection criteria.
2.1.2. Inclusion criteria
Articles available in full text
Articles available in English
A neutral thoracic kyphosis value angle was recorded
Measurement of validity and/or reliability was the primary aim
of the study
Studies on human participants were included for review. No
restrictions were made with regard to populations.
2.1.3. Exclusion criteria
Full text in English could not be located
Thoracic kyphosis angle reported in thoracic exion or exten-
sion only
Radiographic measurement techniques only
Initially, article titles and abstracts were screened by the pri-
mary reviewer. Any title and abstract which was not clearly
investigating a psychometric property of a thoracic kyphosis
measurement method was discarded as being not relevant. In cases
of uncertainty about eligibility of a study title/abstract, the full text
was explored. When the original search was narrowed down to
relevant articles only, a second reviewer independently applied the
selection criteria to the chosen articles to ensure all articles were
suitable for review. There were no disagreements between re-
viewers regarding the eligibility of chosen articles.
2.2. Quality assessment
The critical appraisal tool used was a relatively new checklist
(Brink and Louw, 2011) which was designed for testing combined
reliability and validity studies or validity and reliability on their
own. The checklist, which is comprised of 13 items, does not report
a quality score. This tool was developed fromtwo existing tools, the
Quality Assessment of Diagnostic Accuracy Studies (QUADAS) and
the Quality Appraisal of Diagnostic Reliability Studies (QAREL). As
some of the included studies assess both reliability and validity of
the instrument, this checklist was more convenient than using the
QUADAS or QUAREL separately. The criteria are provided as a
footnote to Table 2. The studies were considered of high quality if
they scored 60%, as done previously (van der Wurff et al., 2000;
May et al., 2006; 2010; Adhia et al., 2012).
Quality assessment was performed independently by two re-
viewers on each paper. In the pilot stage, each reviewer indepen-
dently rated two non-included articles using the checklist, in order
to identify any difference in interpretations of the items. This pro-
cess recorded a kappa score of .92, which was regarded as accept-
able to continue. Disagreements were resolved by discussion and
all items were claried.
2.3. Data analysis
Meta-analysis was not attemptedduetothe heterogeneityof tests,
participants and analyses. Also, a subgroup analysis could not be
performed due to the limited number of studies evaluating the same
thoracic kyphosis measurement technique. Hence a descriptive
analysis was conducted and data were synthesized using a level of
evidence approach (van Tulder et al., 2003), displayed in Table 1.
The Intraclass Correlation Coefcient (ICC) and Pearsons Cor-
relation Coefcient were interpreted as follows: .00e.29 as very
Table 1
Levels of evidence approach (van Tulder et al., 2003).
Level of evidence Criteria
Strong Consistent ndings from 3 high quality studies
Moderate Consistent ndings from at least 1 high quality and one
or more low quality studies
Limited Consistent ndings in 1 low quality studies or only 1
study available
Conicting Inconsistent evidence in multiple studies irrespective of
study quality
No evidence No studies found
E. Barrett et al. / Manual Therapy 19 (2014) 10e17 11
low correlation, .30e.49 as low correlation, .50e.69 as moderate
correlation, .70e.89 as high correlation, .90e1.00 as very high
correlation (Munro and Visintainer, 2005).
3. Results
3.1. Selection of studies
Fig. 1 presents a ow diagram, based on the PRISMA guidelines
(Liberati et al., 2009), which details the movement of articles
through the review process. Twenty-seven articles were included
for review under the outlined selection criteria. Of these studies, 2
investigated validity only, 16 investigated reliability only and 9
investigated both reliability and validity. Of the 16 included reli-
ability studies, 1 investigated inter-rater reliability, 7 investigated
intra-rater reliability and 8 investigated both intra- and inter-rater
reliability.
3.2. Methodological quality
Eighteen out of twenty-eight studies were deemed to be of high
quality (score 60%). The full scoring process is displayed inTable 2.
The two reviewers initially disagreed on 12 items across all studies
(kappa score .94). The disagreement between the two reviewers
was then resolved by discussion. A third reviewer was available to
moderate disagreement but was not required. Both of the included
validity studies were of high quality (Leroux et al., 2000; Gravina
et al., 2012). Five out of nine combined reliability and validity
studies were of high quality (Teixeira and Carvalho, 2007; Ripani
et al., 2008; Chaise et al., 2011; Greendale et al., 2011; de Oliveira
et al., 2012). Eleven out of seventeen reliability studies were of
high quality (hln et al., 1989; Lundon et al., 1998; Purser et al.,
1999; Hinman, 2004; Mannion et al., 2004; Kellis et al., 2008;
Lewis and Valentine, 2010; Melvin et al., 2010; Sheeran et al.,
2010; Czaprowski et al., 2012; Folsch et al, 2012; van
Blommestein et al., 2012).
The main areas of weakness found were inadequate description
of the raters, insufcient between-rater and within-rater blinding,
lack of variation in testing order and inappropriate or insufciently
described statistical analyses.
3.3. Study characteristics
A total of 15 methods for thoracic kyphosis measurement were
found within reviewed articles. The Flexicurve index and the
Debrunner kyphometer were the most commonly studied, in terms
of both reliability and validity. A list of all methods is below.
Arcometer (DOsualdo et al., 1997; Chaise et al., 2011)
Flexicurve index (Yanagawa et al., 2000; Hinman, 2004; Teixeira
and Carvalho, 2007; Greendale et al., 2011)
Flexicurve angle (Greendale et al., 2011; de Oliveira et al., 2012)
Debrunners kyphometer (hln et al., 1989; Purser et al., 1999;
Korovessis et al., 2001; Greendale et al., 2011)
Spinal Mouse (Mannion et al., 2004; Kellis et al., 2008; Ripani
et al., 2008)
Manual inclinometer (Lewis and Valentine, 2010; van
Blommestein et al., 2012)
Digital inclinometer (Czaprowski et al., 2012)
3D ultrasound (Folsch et al, 2012), photogrammetry (Dunk et al.,
2004, 2005; Saad et al., 2012)
Rasterstereography (Goh et al., 1999; Melvin et al., 2010)
Stereovideography (Leroux et al., 2000)
Goniometer (Gravina et al., 2012)
Electrogoniometer (Perriman et al., 2010)
Spinal wheel (Sheeran et al., 2010)
Pantograph (Willner, 1981)
Photogrammetry (Dunk et al., 2004, 2005; Saad et al., 2012).
Table 2
1. Adequate description of study population; 2. Adequate description of raters; 3. Adequate explanation of reference standard; 4. Between-rater blinding; 5. Within-rater
blinding; 6. Variation of testing order; 7. Time period between index test and reference standard; 8. Time period between repeated measures; 9. Independency of refer-
ence standard fromindex test; 10. Adequate description of index test procedure; 11. Adequate description of reference standard procedure; 12. Explanation of any withdrawals;
13; Appropriate statistical methods.
Study 1 2 3 4 5 6 7 8 9 10 11 12 13 High quality?
Chaise et al., 2011 O X O X X X O O O O O O O Yes
Czaprowski et al., 2012 O O n/a O O X n/a O n/a O n/a O X Yes
de Oliveira et al., 2012 O X O X X X O O O O O O X Yes
DOsualdo et al., 1997 O X O X X X O X O X X O X No
Dunk et al., 2004 O X n/a n/a X X n/a X n/a O n/a O X No
Dunk et al., 2005 O X n/a n/a X X n/a X n/a O n/a O O No
Folsch et al., 2012 O X n/a n/a X X n/a O n/a O n/a O O Yes
Goh et al., 1999 X X n/a n/a X X n/a O n/a X n/a O X No
Gravina et al., 2012 O X O n/a n/a n/a X n/a O O X O O Yes
Greendale et al., 2011 O O O O X X O O O O O O X Yes
Hinman 2004 O O n/a X n/a X n/a O n/a O n/a O O Yes
Kellis et al., 2008 O O n/a O O X n/a O n/a O n/a O O Yes
Korovessis et al., 2001 O X O O X X X X O O O O X No
Leroux et al., 2000 O X O n/a n/a n/a X n/a O O O O O Yes
Lewis and Valentine 2010 O X n/a n/a O O n/a X n/a O n/a O O Yes
Lundon et al., 1998 O O n/a X O O n/a O n/a O n/a O X Yes
Mannion et al., 2004 O O n/a X X O n/a O n/a O n/a O X Yes
Melvin et al., 2010 O O n/a X X X n/a O n/a O n/a O X No
hln et al., 1989 O X n/a O O O n/a X n/a O n/a O X Yes
Perriman et al., 2010 O X O n/a X X O O X O O O X No
Purser et al., 1999 O X n/a n/a X X n/a O n/a O n/a O O Yes
Ripani et al., 2008 O O O X X X O O O O O O X Yes
Saad et al., 2012 O O n/a O X X n/a X n/a O n/a X X No
Sheeran et al., 2010 O O n/a X X O n/a O n/a O n/a O O Yes
Teixeira and Carvalho 2007 O O O X X X O O O O O O X Yes
Van Blommestein et al., 2012 O X n/a n/a O O n/a O n/a O n/a O O Yes
Willner 1981 X X O X X X X X O O O O X No
Yanagawa et al., 2000 O X n/a O O X n/a X n/a O n/a O X No
E. Barrett et al. / Manual Therapy 19 (2014) 10e17 12
3.4. Types of participants
A healthy sample of participants was used in 21/28 studies.
Only 7 studies included subjects with any degree of pathology,
with 4 involving postmenopausal, osteoporotic women (Lundon
et al., 1998; Purser et al., 1999; Yanagawa et al., 2000; Greendale
et al., 2011) and 3 studies of subjects with scoliosis (Willner,
1981; Leroux et al., 2000; Saad et al., 2012). The subject BMI
was unreported in 15 studies, while 5 studies reported an
average BMI 25 (Mannion et al., 2004; Sheeran et al., 2010;
Greendale et al., 2011; Chaise et al., 2011; de Oliveira et al.,
2012) and 3 studies reported an average BMI <25 (Ripani
et al., 2008; Melvin et al., 2010; Saad et al., 2012). The major-
ity of studies used subjects with a mean age between 20 and 65
years. 6 studies used subjects with a mean age between 10 and
19 years (DOsualdo et al., 1997; Willner, 1981; Leroux et al.,
2000; Korovessis et al., 2001; Kellis et al., 2008; Gravina et al.,
2012) and 1 study compared pre- and post-menopausal
women (Hinman, 2004). Only 5 studies used a population with
mean age 65 years (Lundon et al., 1998; Purser et al., 1999;
Yanagawa et al., 2000; Teixeira and Carvalho, 2007; Greendale
et al., 2011).
3.5. Reliability and validity
All reliability studies showed high to very high levels of reli-
ability. The validity of the methods ranged from low to very high.
However, only 11 out of 27 studies assessed validity. This is shown
in more detail in Table 3.
3.6. Level of evidence
Table 4 details the accumulated level of evidence found for all
methods. For the majority, there is a limited or inconsistent level of
evidence for the reliability and validity of methods. Strong and
moderate levels of evidence have been found for a small selection
of methods.
4. Discussion
4.1. Main ndings
This review highlighted 15 methods for the non-invasive mea-
surement of thoracic kyphosis, ranging from simple, skin-surface
measures to computerized postural analysis systems. In general,
high to very high levels of reliability were found for all investigated
measurement techniques. The validity of these techniques was less
commonly studied and ranges from low to very high. On observa-
tion of the data, the more technological methods (e.g. raster-
stereography, 3D ultrasound, stereovideography) do not appear to
offer greater reliability or validity than the simpler methods. In fact,
the strongest level of evidence was in support of the high to very
high levels of reliability of the Flexicurve index, Debrunner kyph-
ometer and Spinal Mouse, which are simple, hand-held tools.
Fig. 1. PRISMA ow diagram.
E. Barrett et al. / Manual Therapy 19 (2014) 10e17 13
4.2. Validity
Signicant barriers to validity testing are the limited accessi-
bility and the ethical issues regarding the use of spinal X-rays
(Greendale et al., 2011). This is likely to be a large contributing
factor to the retrieval of only two studies which exclusively
examined the validity of a non-invasive instrument for thoracic
kyphosis measurement (DOsualdo et al., 1997; Perriman et al.,
2010). Other methods have been suggested as alternates to the
Cobb angle, such as the centroid method (Chen, 1999) and posterior
tangent method (Harrison et al., 2001). However, these are all still
radiographically based.
There are several reasons as to why skin-surface devices may
falter in validity. Skin-surface techniques follow the line of the
spinous processes and not that of the vertebral bodies, as done
radiographically (Mannion et al., 2004). Secondly, the varying dis-
tribution of adipose tissue overlying the spine imposes on the ac-
curacy obtained (Mannion et al., 2004). This may have been
inuential as only 1 retrieved validity study reported a BMI <25
(Ripani et al., 2008), whereas 2 studies reported BMI >25
(Greendale et al., 2011; Chaise et al., 2011) and the remaining 8
validity studies had unreported BMI. As detailed by reviewed
studies, other sources which were likely to lower the validity scores
included incorrect landmark palpation (Leroux et al., 2000;
Greendale et al., 2011), measurement error in calculating the
Cobb angle (Ripani et al., 2008; Greendale et al., 2011) and the
operation of the device (Korovessis et al., 2001; Ripani et al., 2008).
The Debrunner kyphometer, arcometer, inclinometry, goniom-
etry and electrogoniometry attain a kyphosis value from placing
the instrument on selected limits of the curve, a method which is
similar to the calculation of the Cobb angle. Alternatively, the
Flexicurve, Spinal Mouse, Spinal Wheel and pantograph provide a
representation of spinal curvature continuously throughout the
thoracic spine. By observing the validity data, there appears to be
no obvious trend in higher or lower validity scores by using either
method. However, over time, relying on selected limits of the curve
Table 3
Reliability and validity data for all methods.
Reference High
quality?
Reliability
(ICC/Cronbach alpha)
SEM Validity
(correlation
coefcient)
Folsch
et al, 2012
Yes .95 (intra) 3.7

N/A
Chaise
et al., 2011
Yes .98 (inter), .99
(intra)
.94
DOsualdo
et al., 1997
No .99 (intra inter) .98
Korovessis
et al., 2001
No .84 (inter), .92
(intra)
.759
hln
et al., 1989
Yes .92, .93
(intra), .91, .94 (inter)
N/A
Purser
et al., 1999
Yes .95e.97 (intra) N/A
Lundon
et al., 1998
Yes .88 (inter), .89e.99
(intra)
N/A
Greendale
et al., 2011
Yes .98 (inter intra) .622
Czaprowski
et al., 2012
Yes .83 (intra) 3.8

(intra) N/A
Perriman
et al., 2010
No .9e.95 (intra) .538e.876
Greendale
et al., 2011
Yes .96 (intra inter) .656e.758
Lundon
et al., 1998
Yes .89e.98 (intra), .87
(inter)
N/A
de Oliveira
et al., 2012
Yes .94 (inter), .82 (intra) .7
Greendale
et al., 2011
Yes .96 (intra inter) .686e.756
Yanagawa
et al., 2000
No .93 (intra) N/A
Teixeira and
Carvalho,
2007
Yes .87 (intra), .94 (inter) .528e.906
Hinman, 2004 Yes .93 and .94 (inter) N/A
Gravina
et al., 2012
Yes N/A .897
Lewis and
Valentine,
2010
Yes .93e.97 (intra) 1

, 1.7

N/A
Van Blommestein
et al., 2012
Yes .92e.96 (intra) 1.7, 2.3
Willner, 1981 No ICC not reported .94
Dunk
et al., 2004
No .351e.691 (intra) N/A
Dunk
et al., 2005
Yes .310e.727 (intra) N/A
Saad
et al., 2012
No .93e.95 (intra), .97
(inter)
N/A
Goh
et al., 1999
No .95 (intra) .6

e.8

N/A
Melvin
et al., 2010
No .921e.992
(intra), .979 (inter)
N/A
Mannion
et al., 2004
Yes .73e.88
(intra), .83e.87 (inter)
4.2

, 2.8

(intra)
N/A
Kellis
et al., 2008
Yes .81e.87
(intra), .88e.89 (inter)
2.3

, 2.7

(intra), 1.4

,
2.1

(inter)
N/A
Ripani
et al., 2008
Yes .828e.991
(intra),666e.991 (inter)
.385e.467
Sheeran
et al., 2010
Yes .833e.98
(intra), .986 (inter)
1.7

, 5.5

(intra), 2

(inter)
N/A
Leroux
et al., 2000
Yes N/A .89
Table 4
Level of evidence.
Level of
evidence
Method Reliability Validity
Strong Debrunner kyphometer Very high intra-rater
reliability
Flexicurve index Very high inter-rater
reliability
Spinal Mouse Very high intra inter-
rater reliability
Moderate Arcometer Very high intra inter-
rater reliability
Very high validity
Flexicurve index Moderate validity
Manual inclinometer Very high intra-rater
reliability
Limited Goniometer High validity
Stereovideographic High validity
Pantograph Very high validity
Digital inclinometer High intra inter-rater
reliability
Spinal Wheel Very high intra inter-
rater reliability
Rasterstereography Very high intra inter-
rater reliability
Electrogoniometry Very high intra-rater
reliability
High validity
Photogrammetry Very high inter-rater
reliability
Spinal Mouse Low validity
Conicting Photogrammetry Low-very high intra-
rater reliability
Debrunner kyphometer High-very high inter-
rater reliability
Moderate-high
validity
Flexicurve index High-very high intra-
rater reliability
Flexicurve angle High-very high inter
intra-rater reliability
Moderate-high
validity
E. Barrett et al. / Manual Therapy 19 (2014) 10e17 14
may fail to reveal changes regionally within the thoracic curvature.
This may create a discrepancy for populations with osteoporosis or
Sheuermanns disease where single vertebral wedging is common
(Briggs et al., 2007; Czaprowski et al., 2012; Gravina et al., 2012).
Therefore, the latter techniques may be more sensitive and robust
over time (Greendale et al., 2011), and thus may be more appro-
priate for on-going assessment.
4.3. Reliability
The variable nature of thoracic kyphosis poses a potential
challenge to reliability studies (DOsualdo et al., 1997). Included
studies have acknowledged the postural variance both fromsession
to session as a result of sporting, vocational or routine activity
(DOsualdo et al., 1997; Lewis and Valentine, 2010), fatigue from
repeated measures (Hinman, 2004; Kellis et al., 2008) and reposi-
tioning errors (Sheeran et al., 2010; van Blommestein et al., 2012).
However, some studies attempted to control this by re-testing
within the same session (DOsualdo et al., 1997; Goh et al., 1999;
Leroux et al., 2000; Teixeira and Carvalho, 2007; Melvin et al.,
2010; Lewis and Valentine, 2010; Chaise et al., 2011; Greendale
et al., 2011; de Oliveira et al., 2012) or re-testing at the same time
of day (Korovessis et al., 2001; Mannion et al., 2004; Kellis et al.,
2008; Saad et al., 2012; Folsch et al, 2012; de Oliveira et al.,
2012). Others used techniques such as using the same light and
temperature conditions (Saad et al., 2012) and restricting sporting
activities between measurement days (Folsch et al, 2012). As reli-
ability data was largely positive, these controls appeared to be
sufcient to stabilize the thoracic kyphosis between
measurements.
Further potential challenges to taking reliable measures
included the accurate palpation of spinal landmarks. The validity of
palpation of spinal landmarks has been reported to be poor
throughout the spine (OHaire and Gibbons, 2000; French et al.,
2000; Billis et al., 2003). Difculty in palpating landmarks was
frequently discussed by studies under review (DOsualdo et al.,
1997; Lundon et al., 1998; Leroux et al., 2000; Dunk et al., 2004;
Hinman, 2004; Mannion et al., 2004; Kellis et al., 2008; Lewis
and Valentine, 2010; Sheeran et al., 2010; Chaise et al., 2011;
Greendale et al., 2011; de Oliveira et al., 2012; van Blommestein
et al., 2012). It has been noted that the accuracy of palpation can
depend on the skill of the examiner (Billis et al., 2003; Haneline and
Young, 2009). Testers under review included physiotherapists/
physical therapists (Lundon et al., 1998; Purser et al., 1999; Teixeira
and Carvalho, 2007; Melvin et al., 2010; Sheeran et al., 2010; Saad
et al., 2012; Czaprowski et al., 2012), physicians (Korovessis et al.,
2001; Ripani et al., 2008) and researchers (Kellis et al., 2008;
Greendale et al., 2011). Level of experience with instrument
ranged from novice (Hinman, 2004; Greendale et al., 2011) to
experienced (Mannion et al., 2004) but was largely undescribed.
Therefore, it is unclear if the level of experience of the tester
contributed to the reliability obtained. Several studies did not
remove markers of palpated landmarks between raters (Lundon
et al., 1998; Mannion et al., 2004; Sheeran et al., 2010; Chaise
et al., 2011; de Oliveira et al., 2012). The use of the same marked
points is likely to have increased the reliability of measurements
between raters.
Furthermore, variation in amount of pressure applied with in-
strument (DOsualdo et al., 1997; Hinman, 2004; Mannion et al.,
2004; Kellis et al., 2008; Ripani et al., 2008; Sheeran et al., 2010;
de Oliveira et al., 2012), unstandardized instructions (hln et al.,
1989; Lundon et al., 1998; Hinman, 2004; Mannion et al., 2004)
and variations in subject positioning (Dunk et al., 2004; Folsch et al,
2012; de Oliveira et al., 2012) were other commonly discussed
challenging factors.
4.4. Methodological considerations
There were some methodological limitations of the reviewed
studies. Firstly, the majority of studies investigated a healthy
sample, of mean age between 20 and 65 years and of unreported
BMI. A healthy population of this age bracket is not necessarily
representative of a clinical population (Whiting et al., 2003) and so
the results cannot be generalised to the clinical population. How-
ever, both studies which contributed to the strong level of evidence
for the very high inter-rater reliability of the Flexicurve index used
a postmenopausal, osteoporotic sample (Yanagawa et al., 2000;
Greendale et al., 2011), which increases the clinical applicability
of the Flexicurve index. BMI is an important sample characteristic
as, in reality, the bony landmarks may be more difcult to palpate in
obese people leading to higher measurement errors (Langendefer
et al., 2009; Greendale et al., 2011).
Secondly, a description of the raters was only sometimes
described, further limiting the generalizability of the results.
Thirdly, some studies did not describe their statistical methods
sufciently and others used inappropriate analyses. The lack of
measures of precision by some studies limits the clinical applica-
bility of their results. Lastly, some studies did not perform (Dunk
et al., 2004, 2005; Perriman et al., 2010; Sheeran et al., 2010;
Chaise et al., 2011; Folsch et al, 2012; de Oliveira et al., 2012) or
did not detail (Willner, 1981; Goh et al., 1999; Hinman, 2004;
Mannion et al., 2004; Teixeira and Carvalho, 2007; Saad et al.,
2012) controls to ensure between-rater and within-rater blinding.
The lack of blinding in inter-rater and intra-rater reliability studies
may have inated the agreement between raters or between
measures respectively.
4.5. Limitations of review
The strengths of the present review are its systematic nature,
the comprehensive search strategy based on PRISMA guidelines, its
use of multiple reviewers and its inclusion of all populations.
However, only articles in English were included. During the title/
abstract screening, 9 articles were excluded due to their unavail-
ability in English. As there were so few studies found investigating
each thoracic kyphosis measurement technique, these articles
could have made a signicant difference to the overall conclusions
of the study. Secondly, the two reviewers assessing the methodo-
logical quality of the studies were not blinded to the results of the
studies. While this may have produced an opportunity for reviewer
bias (Stochkendahl et al., 2006), the stringent criteria of the critical
appraisal tool and the use of multiple reviewers reduced the like-
lihood of bias. Thirdly, the wide level of heterogeneity amongst
study populations, procedures and testers indicates that the
external validity of this review is low.
4.6. Clinical and research implications
The 15 methods highlighted in this systematic review indicate
that clinicians have a wide scope of options for thoracic kyphosis
measurement. For the present, clinicians must choose a method
using their best judgment of the reliability and validity data pre-
sented in this review. The Flexicurve index, Debrunner kyphometer
and the Spinal Mouse have the strongest evidence base in terms of
their reliability and the Flexicurve index and arcometer have the
strongest level of evidence in terms of validity. Factors such as low
cost, ease of use for entry level clinicians, and short measurement
time have been previously considered to argue for the use of the
Flexicurve (Greendale et al., 2011). However, theabsenceof evidence
does not meananoutcome measure is not suitable, only that no data
has yet been published to verify validity and reliability. Clinicians
E. Barrett et al. / Manual Therapy 19 (2014) 10e17 15
must be also be mindful of the populations inwhich these measures
were tested and the expertise of the raters testing them.
This systematic review identied the strong need for further
research into the psychometric properties of thoracic kyphosis
measurement methods, especially methods with limited and
inconsistent levels of evidence. As responsiveness to change is an
important property to be considered, future research should also
consider this. The early research appears promising, but a true
representation of the reliability and validity cannot be made until
further studies emerge. It is recommended that future research
should include representative samples of patients, incorporate
adequate measures to ensure subject and examiner blinding, and
consider the use of clinically relevant statistical analyses accom-
panied by estimates of precision.
5. Conclusion
A wide range of thoracic kyphosis measurement techniques
have been reviewed. However, there are few studies investigating
each technique. Overall, reliability data for investigated techniques
is very positive but generally remains limited. The validity of the
techniques was lower than their reliability but information on
validity is lacking for many measures. The strongest levels of evi-
dence for reliability exists in support of the Debrunner kyphometer,
Spinal Mouse and Flexicurve index, and for validity supports the
arcometer and Flexicurve index. Perhaps the Flexicurve may be the
most feasible as it is inexpensive, easy to use and has high levels of
both reliability and validity. Future research should concentrate on
methods with limited and inconsistent levels of evidence as iden-
tied by this review.
References
Adhia DB, Bussey MD, Ribeiro DC, Tumilty S, Milosavljevic S. Validity and reliability
of palpation-digitization for non-invasive kinematic measurement, a systematic
review. Man Ther 2012:1e9.
Ayub E. Posture and the upper quarterIn Physical therapy of the shoulder. Mel-
bourne: Churchill Livingstone; 1991. p. 81e90.
Billis EV, Foster NE, Wright CC. Reproducibility and repeatability: errors of three
groups of physiotherapists in locating spinal levels by palpation. Man Ther
2003;8(4):223e32.
Briggs A, Wrigley T, Tully E, Adams P, Greig A, Bennell K. Radiographic measures of
thoracic kyphosis in osteoporosis: Cobb and vertebral centroid angles. Skeletal
Radiol 2007;36(8):761e7.
Brink Y, Louw QA. Clinical instruments: reliability and validity critical appraisal.
J Eval Clin Pract 2011:1e7.
Chaise FO, Candotti CT, Torre ML, Furlanetto TS, Pelinson PP, Loss JF. Validation,
repeatability and reproducibility of a non-invasive instrument for measuring
thoracic and lumbar curvature of the spine in the sagittal plane. Revista Bra-
sileira de Fisioterapia 2011;15(6):511e7.
Calliet R. Shoulder pain. 3. Philadelphia: F.A. Davis Company; 1991.
Carmines E, Zeller R. Reliability and validity assessment. Beverley Hills: Sage Pub-
lications; 1979.
Chen Y. Vertebral centroid measurement of lumbar lordosis compared with the
Cobb technique. Spine 1999;24(17):1786e90.
Czaprowski D, Pawlowska P, Gebicka A, Sitarski D, Kotwicki T. Intra- and inter-
observer repeatability of the assessment of anteroposterior curvatures of the
spine using Saunders digital inclinometer. Ortopaedic Traumatol Rehabil
2012;14(2):145e53.
de Oliveira TS, Candotti CT, La Torre M, Pelinson PPT, Furlanetto TS, Kutchak FM,
et al. Validity and reproducibility of the measurements obtained using the
exicurve instrument to evaluate the angles of thoracic and lumbar curvatures
of the spine in the sagittal plane. Rehabil Res Pract 2012:1e9.
Di Bari M, Chiarlone M, Matteuzzi D, Zacchei S, Pozzi C, Bellia V, et al. Thoracic
kyphosis and ventilator dysfunction in unselected older persons: an epidemi-
ological study in Dicomano, Italy. J Am Geriatr Soc 2004;52(6):909e15.
DOsualdo F, Schierano S, Iannis M. Validation of clinical measurement of kyphosis
with a simple instrument, the arcometer. Spine 1997;22(4):408.
Dunk NM, Lalonde J, Callaghan JP. Implications for the use of postural analysis as
a clinical diagnostic tool: reliability of quantifying upright standing spinal
postures from photographic images. J Manipulative Physiol Ther 2005;28(6):
386e92.
Dunk NM, Chung YY, Compton DS, Callaghan JP. The reliability of quantifying up-
right standing postures as a baseline diagnostic clinical tool. J Manipulative
Physiol Ther 2004;27(2):91e6.
Flsch C, Schlgel S, Lakemeier S, Wolf U, Timmesfeld N, Skwara A. Test-retest
reliability of 3D ultrasound measurements of the thoracic spine. J Inj Funct
Rehabil 2012;4(5):335e41.
French SD, Green S, Forbes A. Reliability of chiropractic methods commonly used to
detect manipulable lesions in patients with chronic low-back pain.
J Manipulative Physiol Ther 2000;23(4):231e8.
Goh S, Price RI, Leedman PJ, Singer KP. Rasterstereographic analysis of the
thoracic sagittal curvature: a reliability study. J Musculoskelet Res
1999;3(2):137.
Gravina AR, Ferraro C, Frizziero A, Ferraro M, Masiero S. Goniometer evaluation of
thoracic kyphosis and lumbar lordosis in subjects during growth age: a validity
study. Stud Health Technology Inform 2012;176:247e51.
Gray J, Grimsby O. Interrelationship of the spine, rib cage, and shoulderIn Physical
therapy of the shoulder. Edinburgh: Churchill Livingston; 2004. p. 133e85.
Greendale G, Nili N, Huang MH, Seeger L, Karlamangla A. The reliability and validity
of three non-radiological measures of thoracic kyphosis and their relations to
the standing radiological Cobb angle. Osteoporos Int 2011;22(6):1897e905.
Haneline MT, Young M. A review of intraexaminer and interexaminer reliability of
static spinal palpation: a literature synthesis. J Manipulative Physiol Ther
2009;32(5):379e86.
Harrison DE, Cailliet R, Harrison DD, Janik TJ, Holland B. Reliability of centroid, Cobb,
and Harrison posterior tangent methods: which to choose for analysis of
thoracic kyphosis. Spine 2001;26(11):227e34.
Hinman MR. Interrater reliability of exicurve postural measures among novice
users. J Back Musculoskelet Rehabil 2004;17(1):33.
Horter TS. How to care for your neck. Phys Ther 1978;52(2):184e5.
Kado DM, Huang MH, Nguyen, Barrett-Connor E, Greendale GA. Hyperkyphotic
posture and risk of injurious falls in older persons: the Rancho Bernardo Study.
J Gerontol Biol Sci 2007;62(6):682e7.
Kellis E, Adamou G, Tzilios G, Emmanouilidou M. Reliability of spinal range of
motion in healthy boys using a skin-surface device. J Manipulative Physiol Ther
2008;31(8):570e6.
Korovessis P, Petsinis G, Papazisis Z, Baikousis A. Prediction of thoracic
kyphosis using the Debrunner kyphometer. J Spinal Disord 2001;14(1):
67e72.
Langenderfer JE, Rullkoetter PJ, Mell AG, Laz PJ. A multi-subject evaluation of un-
certainty in anatomical landmark location on shoulder kinematic description.
Computer Methods Biomech Biomed Eng 2009;12(2):211e6.
Leroux MA, Zabjek K, Simard G, Badeaux J, Coillard C, Rivard CH. A non-invasive
anthropometric technique for measuring kyphosis and lordosis, an application
for idiopathic scoliosis. Spine 2000;25(13):1689e94.
Lewis JS, Valentine RE. Clinical measurement of the thoracic kyphosis. A study of
the intra-rater reliability in subjects with and without shoulder pain. BMC
Musculoskelet Disord 2010;11:39.
Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gtzsche PC, Ioannidis JPA, et al. The
PRISMA statement for reporting systematic reviews and meta-analyses of
studies that evaluate health care interventions: explanation and elaboration.
PLoS Med 2009;6:e1000100.
Lohr KN. Assessing health status and quality-of-life instruments: attributes and
review criteria. Qual Life Res 2002;11(3):193.
Lundon KMA, Li A, Bibershtein S. Interrater and intrarater reliability in the mea-
surement of kyphosis in postmenopausal women with osteoporosis. Spine
1998;23(18):1978e85.
Lydick E, Zimmerman SI, Yawn B, Love B, Kleerekoper M, Ross P, et al. Development
and validation of a discriminative quality of life questionnaire for osteoporosis.
Journal Bone Mineral Research 1997;12:456e63.
Mannion AF, Knecht K, Balaban G, Dvorak J, Grob D. A new skin-surface device for
measuring the curvature and global and segmental ranges of motion of the
spine: reliability of measurements and comparison with data reviewed from
the literature. Eur Spine J 2004;13(2):122e36.
May S, Chance-Larsen K, Littlewood C, Lomas D, Saad M. Reliability of physical
examination tests used in the assessment of patients with shoulder problems: a
systematic review. Physiotherapy 2010;96(3):179e90.
May S, Littlewood C, Bishop A. Reliability of procedures used in the physical ex-
amination of non-specic low back pain: a systematic review. Aust J Physiother
2006;52:91e102.
Melvin M, Sylvia M, Udo W, Helmut S, Paletta JR, Adrian S. Reproducibility of ras-
terstereography for kyphotic and lordotic angles, trunk length and trunk
inclination, a reliability study. Spine 2010;35(14):1353e8.
Milne JS, Williamson J. A longitudinal study of kyphosis in older people. Age Ageing
1983;12:225e33.
Munro BH, Visintainer MA. Statistical methods for health care research. Philadel-
phia: Lippincott Williams and Wilkins; 2005. p. 239e58.
Murray PM, Weinstein SL, Spratt KF. The natural history and long-term follow-up of
Scheuermann kyphosis. J Bone Joint Surg 1993;75(2):236e48.
OHaire C, Gibbons P. Inter-examiner and intra-examiner agreement for assessing
sacroiliac anatomical landmarks using palpation and observation: pilot study.
Man Ther 2000;5(1):13e20.
hln G, Spangfort E, Tingvall C. Measurement of spinal sagittal conguration and
mobility with Debrunners kyphometer. Spine 1989;14:580e3.
Perriman DM, Scarvell JM, Hughes AR, Ashman B, Lueck CJ, Smith PN. Validation of
the exible electrogoniometer for measuring thoracic kyphosis. Spine
2010;35(14):633e40.
Portney LG, Watkins MP. Foundations of clinical research. Applications to practice.
Upper Saddle River, New Jersey: Prentice Hall Health; 2000.
E. Barrett et al. / Manual Therapy 19 (2014) 10e17 16
Purser JL, Pieper CF, Branch LG, Duncan PW, Gold DT, McConnell ES, et al. Reliability
of physical performance tests in four different randomized clinical trials. Arch
Phys Med Rehabil 1999;80:557e61.
Ripani M, Di Cesare A, Giombini A, Agnello L, Fagnani F, Pigozzi F. Spinal
curvature: comparison of frontal measurements with the spinal mouse
and radiographic assessment. J Sports Med Phys Fitness 2008;48(4):
488e94.
Saad KR, Colombo AS, Ribeiro AP, Joao SM. Reliability of photogrammetry in the
evaluation of the postural aspects of individuals with structural scoliosis.
J Bodywork Movement Therapies 2012;16(2):210e6.
Sheeran L, Sparkes V, Busse M, van Deursen R. Preliminary study: reliability of the
spinal wheel. A novel device to measure spinal postures applied to sitting and
standing. Eur Spine J 2010;19(6):995e1003.
Stochkendahl MJ, Christensen HW, Hartvigsen J, Vach W, Haas M, Hestbaek L, et al.
Manual examination of the spine: a systematic critical literature review of
reproducibility. J Manipulative Physiol Ther 2006;29(6):475e85.
Teixeira F, Carvalho G. Reliability and validity of thoracic kyphosis measure-
ments using exicurve method. Revista Brasileira de Fisioterapia
2007;11(3):199e204.
Terwee CB, Bot SDM, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al. Quality
criteria were proposed for measurement properties of health status question-
naires. J Clin Epidemiol 2007;60(1):34e42.
Van Blommestein AS, Lewis AS, Morrissey MC, MacRae S. Reliability os measuring
thoracic kyphosis angle, lumbar lordosis angle and straight leg raise with an
inclinometer. Open Spine J 2012;4:10e5.
van de Ven-Stevens LA, Munneke M, Terwee CB. Clinimetric properties of in-
struments to assess activities in patients with hand injury: a systematic review
of the literature. Arch Phys Med Rehabil 2009;90:151e69.
van der Wurff P, Hagmeijer RHM, Meyne W. Clinical tests of the sacroiliac joint. A
systematic methodological review. Part 1. Reliability Manual Therapy 2000;5:
30e6.
van Tulder M, Furlan A, Bombardier C, Bouter L. Updated method guidelines for
systematic reviews in the Cochrane Collaboration Back Review Group. In: The
Cochrane library. Oxford: Update Software; 2003. 4.
Whiting P, Rutjes AWS, Reitsma JB, Bossuyt PMM, Kleijnen J. The development of
QUADAS: a tool for the quality assessment of studies of diagnostic accuracy
included in systematic reviews. BMC Med Res Methodol 2003;3(25):25.
Willner S. Spinal pantograph: a non-invasive technique for describing kyphosis and
lordosis in the thoraco-lumbar spine. Acta Orthop Scand 1981;52:525e9.
Wright JG, Feinstein AR. Improving the reliability of orthopaedic measurements.
J Bone Joint Surg 1992;74:287e91.
Yanagawa TL, Maitland ME, Burgess K, Young L, Hanley D. Assessment of thoracic
kyphosis using the exicurve for individuals with osteoporosis. Hong Kong
Physiother J 2000;18(2):53e7.
E. Barrett et al. / Manual Therapy 19 (2014) 10e17 17

You might also like