You are on page 1of 23

Chng trnh hun luyn y khoa YKHOA.

NET Training Nguyn Vn Tun


1
Phn tch hi qui logistic
(logistic regression analysis)

Nguyn Vn Tun


Nhiu nghin cu y khoa (v khoa hc thc nghim ni chung) c mc tiu chnh
l phn tch mi tng quan gia mt (hay nhiu) yu t nguy c v nguy c mc bnh.
Chng hn nh i vi mt nghin cu v mi tng quan gia thi quen ht thuc l v
ung th phi, th yu t nguy c y l thi quen ht thuc l v i tng phn tch l
nguy c mc ung th phi. Ni theo thut ng dch t hc, yu t nguy c chnh l risk
factors, v i tng phn tch l outcome. Trong cc nghin cu ny, i tng phn
tch thng c th hin qua cc bin s nh phn, tc l c/khng, mc bnh/khng
mc bnh, cht/sng, xy ra/khng xy ra, v.v Yu t nguy c c th l cc bin s
lin tc (nh tui, p sut mu, mt xng, v.v) hay cc bin nh phn (nh gii
tnh) hay bin mang c tnh th bc (nh tnh trng ca bnh dao ng t nh, trung
bnh n nghim trng).

Vn t ra cho cc nghin cu dng ny l lm cch no c tnh tng
quan (magnitude of association) gia yu t nguy c v bnh. Cc phng php phn
tch nh m hnh hi qui tuyn tnh (linear regression model) khng th p dng c,
bi v bin ph thuc (dependent variable) khng phi l mt bin lin tc, m l bin nh
phn. Vo thp nin 1970s nh thng k hc David R. Cox pht trin mt m hnh c tn
l logistic regression model (m ti tm dch l m hnh hi qui logistic) phn tch
cc bin nh phn. Ti s gii thch cch ng dng m hnh ny qua mt s v d t n
gin n phc tp. Ti s khng bn n cc chi tit ton hc ca m hnh hi qui
logistic, m ch tp trung vo cc kha cnh thc t v din dch kt qu phn tch.

I. Phn tch hi qui logistic n gin cho nghin cu i chng

V d 1: Nghin cu mi tng quan gia phi nhim cht c da cam v
ung th tuyn tin lit. Giri v ng nghip (2004) tin hnh mt nghin cu s b
thm nh mi lin h gia phi nhim cht c mu da cam (Agent Orange AO) v
nguy c ung th tuyn tin lit (prostate cancer risk) cc cu chin binh M tng tham
chin Vit Nam trc y. Cc nh nghin cu chn on 47 trng hp ung th tin
lit tuyn tng tham chin. Sau , h ngu nhin chn 144 cu chin binh cng tng
tham chin Vit Nam v nay nhp vin v cc l do khng lin quan n ung th. Gi
nhm ny l nhm i chng (control). mi nhm, cc nh nghin cu tm trong h
Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
2
s bnh l v phng vn trc tip bit ai l ngi tng phi nhim AO trong thi
chin. Kt qu cho thy trong s 47 trng hp ung th, c 11 ngi tng b phi nhim
AO, 29 ngi khng tng b phi nhim, v 7 ngi khng r tin s; trong nhm i
chng c 17 ngi khng tng b phi nhim, 106 ngi khng tng b phi nhim, v
21 ngi khng th xc nh phi nhim. Kt qu c th tm lc trong bng s liu sau
y:

Bng 1. Phi nhim AO v ung th tin lit tuyn

Ung th
(n=47)
i chng
(n=142)
Phi nhim AO 11 17
Khng phi nhim AO 29 106
Khng r 7 21
Tng s 47 144
Ghi ch: n l s bnh nhn. Ngun s liu: Giri VN, Cassidy AE, Beebe-Dimmer J, Ellis
LR, Smith DC, Bock CH, Cooney KA. Association between Agent Orange and prostate
cancer: a pilot case-control study. Urology. 2004 Apr;63(4):757-60; discussion 760-1.
Correction in Urology. 2004 Jun;63(6):1213.

minh ha cho phn tch hi qui tuyn tnh v n gin ha vn , ti s gp
chung hai nhm Khng phi nhim AO v Khng r thnh mt nhm chung. (Cch
lm ny c th l mt ti phn tch khc!) Bng s liu trn, do , c th rt gn nh
sau:

Ung th i chng
Phi nhim AO 11 17
Khng phi nhim AO v khng r 36 127

Qua s lin trn y, c th thy 23.4% (hay 11/47) nhm ung th tin lit tuyn
tng b phi nhim AO. Nhng t l ny trong nhm i chng l 11.8% (17/144). Vn
t ra l c s tng quan no gia phi nhim AO v ung th tin lit tuyn hay
khng? Cm t s tng quan c th khai trin thnh hai cu hi c th:

Nguy c mc bnh ung th tin tit tuyn nhng ngi tng b phi
nhim so vi nguy c nhng ngi khng tng b phi nhim l bao
nhiu?

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
3
khc bit v nguy c ung th gia hai nhm c ngha thng k hay
khng?

M hnh phn tch hi qui logistic c th tr li hai cu hi ny. Ch s thng k
quan trng phn tch s liu t cc nghin cu bnh chng (case-control study) nh
trn l t s nguy c (odds ratio hay OR). c tnh OR, ti phi gii thch tng bc
nh sau:

Ting Anh c mt danh t m t nguy c hay kh nng m cc ngn ng u
khc (nh Php, , Ty Ban Nha, Trung Quc, Vit Nam, v.v) khng c: l danh t
odd. Do , ti s tm thi khng dch ch odd sang ting Vit. Ni mt cch ngn gn,
odd l t s ca hai gi tr ca mt bin s nh phn. Do , OR l t s ca hai odds.
Ni cch khc, OR l t s ca hai t s! Trong v d trn, chng ta c:

odd mc ung th trong nhm tng b phi nhim AO l: 11/17 =
0.647;

odd mc ung th trong nhm khng tng b phi nhim AO l:
36/127 = 0.283;

v odds ratio mc bnh ung th trong nhm tng b phi nhim so
vi nhm khng tng b phi nhim l: OR = 0.647 / 0.283 =
2.28.

Tht ra, OR cng c th tnh ngn gn bng mt cng thc:

11 127
2.28
17 36
OR

= =




Ni cch khc, nguy c mc bnh ung th tin lit tuyn trong cc cu chin binh
tng b phi nhim AO cao hn cc cu chin binh khng tng b phi nhim AO
khong 2.3 ln.

Nhng v y l mt nghin cu da vo mt mu duy nht, v c tnh trn y
c th dao ng t mu ny sang mu khc. Nn nh rng, OR l mt c tnh
estimate ca mt OR tht true OR m chng ta khng bit trong thc t. Ch s
nguy c tht ny c th dao ng bt thng t thp hn 1 n cao hn 1. Nu OR tht
thp hn 1, th iu ny c ngha l nhng ngi tng phi nhim AO c nguy c ung
Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
4
th thp hn nhng ngi khng tng phi nhim AO; mt ch s OR tht cao hn 1 cho
bit nhng ngi tng phi nhim AO c nguy c ung th cao hn nhng ngi khng
tng phi nhim AO; v nu OR tht bng 1 th iu ny c ngha l khng c mi lin
h no gia phi nhim AO v ung th tin lit tuyn.

V th, cu hi th hai (v c l quan trng hn) l mi tng quan nh phn nh
qua OR c ngha thng k hay khng? Ni cch khc, nu nghin cu trn c lp li
rt nhiu ln, th dao ng ca OR l bao nhiu? Nu nghin cu c lp li (chng
hn nh) 100 ln, v 95 nghin cu cho ra c s OR dao ng t 1.1 n 3.8, v 5
nghin cu cho thy OR thp hn 1.1 hay cao hn 3.8, th chng ta c bng chng
pht biu rng mi lin h gia phi nhim AO v ung th tin lit tuyn c ngha
thng k statistically significant.

Ni cch khc, chng ta cn phi c tnh sai s chun (standard error) cho OR
v khong tin cy 95% ca OR. V OR l mt t s, cho nn vic c tnh sai s chun
cho OR khng th tin hnh trc tip c (hay c nhng rt phc tp), m phi c
tnh bng cc phng php gin tip. Mt trong nhng phng php gin tip l
phng php Woolf v qui trnh c tnh c th m t tng bc nh sau:

Trc ht, chng ta hon chuyn OR sang n v logart (natural
logarithm):

logOR = log(OR) = log(2.28) = 0.824

Bc th hai l c tnh sai s chun (tm cho k hiu SE) ca logOR qua
cng thc sau y:

1 1 1 1
0.430
11 17 36 127
SE = + + + =

Bc th ba, theo lut phn phi chun, khong tin cy 95% ca logOR
l: logOR 1.96SE, v trong trng hp trn, khong tin cy 95% ca
logOR l:

0.824 1.960.430 = 0.0188
0.824 + 1.960.430 = +1.6668

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
5
V n v va tnh l log, cho nn bc th t l hon chuyn khong tin
cy 95% sang n v t s nh lc ban u:

e
-0.0188
= 0.98 n e
0.16668
= 5.30

Kt qu phn tch trn cho thy tnh trung bnh, OR l 2.28, nhng khong tin cy
95% ca OR dao ng t 0.98 n 5.30. Ni cch khc, nu nghin cu trn c lp li
100 ln, s c 95 nghin cu cho thy OR c th thp hn 1 (0.98) hay thm ch cao n
5.30.

n y, chng ta c kt qu pht biu cho cu hi th hai. Bi v khong tin
cy 95% c th thp hn 1 m cng c th cao hn 1, cho nn chng ta pht biu rng
mi lin h gia phi nhim AO v nguy c mc ung th tuyn tin lit khng c ngha
thng k. Xin nhn mnh, y ch mi l mt kt lun thng k, v ti cha bn n
ngha ca s liu ny trn quan im lm sng v n khng nm trong phm vi ca tho
lun.


II. M hnh hi qui logistic

V d trn minh ha cho phng php phn tch hi qui logistic mang tnh th
cng. Tht ra, m hnh hi qui logistic c th th hin bng mt m hnh chung. Gi p
l xc sut ca mt s kin (trong v d trn, s kin y l bnh ung th tuyn tin
lit), th odd c th nh ngha nh sau:

1
p
odd
p
=



Gi tnh trng phi nhim AO l x, v x c hai gi tr: 0 c ngha l khng tng b
phi nhim, v 1 biu hin cho tnh trng tng b phi nhim AO. M hnh hi qui
logistic pht biu rng log(odd) ty thuc vo gi tr ca x qua mt hm s tuyn tnh
gm 2 thng s nh sau:

( ) log odd x = + +
hay,
log
1
p
x
p

| |
= + +
|

\
[1]

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
6
Trong , log(odd) hay log
1
p
p
| |
|

\
cn c gi l logit(p) (v do , mi c tn
logistic); v l hai thng s cn c tnh t d liu, v l phn d (residual), tc l
phn khng th gii thch bng x. L do hon chuyn t p thnh logit(p) l v p c gi tr
trong khong 0 v 1, trong khi logit(p) c gi tr v gii hn v do thch hp cho
vic phn tch theo m hnh hi qui tuyn tnh.

M hnh trn gi nh rng tun theo lut phn phi chun (normal distribution)
vi trung bnh bng 0 v phng sai bt bin (constant variance). Vi gi nh ny, gi
tr k vng (expected value) hay gi tr trung bnh ca log
1
p
p
| |
|

\
cho bt c gi tr no
ca x l: l x + (v gi tr trung bnh ca l 0). Ni cch khc, odd b ung th, t
phng trnh [1], l:


1
x
p
odd e
p
+ +
= =

[2]

Nh vy m hnh hi qui logistic pht biu rng odd ca mt s kin (ung th
tuyn tin lit) ty thuc vo x (tnh trng phi nhim AO). Da vo phng trnh [1],
nhm khng b phi nhim (x = 0) c odd b ung th (gi tt odd
0
) l:


0
0
odd e e
+
= = [3]

v nhm tng b phi nhim (x = 1) c odd b ung th (odd
1
)l:


1
1
odd e e
+ +
= = [4]


T s ca hai odds chnh l odds ratio (v chnh l l do ti sao ti dch odds
ratio l t s nguy c). T s nguy c OR c th c tnh t [3] v [4] nh sau:



1
0
odd e
OR e
odd e

+
= = = [5]

Trong thc t, chng ta khng bit gi tr tht ca hai thng s v , v phi
c tnh t s liu quan st c. Theo qui c thng k, c s (estimates) ca hai
Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
7
thng s ny c k hiu ha bng du m:
)
v

. Nh trong trng hp v d 1, c
s ca thng s l

= 0.824. Do , OR phn nh odd b ung th trong nhm b phi


nhim AO so vi odd trong nhm khng tng b phi nhim AO. Trong v d 1,

=
e
0.824
= 2.28.


III. c tnh thng s ca m hnh hi qui logistic bng R

Nh va trnh by, phng php c tnh OR v khong tin cy 95% tuy n
gin, nhng kh di dng. Trong trng hp c nhiu bin c lp x, phng php tnh
ton phc tp hn v phn tch bng phng php th cng nh trn s tn nhiu th gi.
Ngy nay, my tnh v cc phn mm thng k c th cung cp cho chng ta mt phng
tin phn tch rt hu hiu. Mt trong nhng phn mm chuyn phn tch thng k c tn
n gin l R m ti c dp gii thiu trong cun sch Phn tch s liu v to biu
bng R (Nh xut bn Khoa hc v K thut, TPHCM 2007).

y, ti s hng dn cch phn tch s liu trn bng R. Trc khi phn tch,
cn phi nhp d liu vo mt khun kh m R c th c c. tin cho vic theo
di, ti trnh by bng s liu mt ln na y:

Ung th i chng
Phi nhim AO 11 17
Khng phi nhim AO v khng r 36 127

y, chng ta c hai bin, gi tt l ao v cancer; mi bin c hai gi tr: 0 (khng)
v 1 (c). Trong nhm ao = 1 (phi nhim) c 28 i tng, v trong s ny c 11
ngi b ung th; trong nhm ao = 1 (khng phi nhim) c 143 i tng v trong s
ny c 36 ngi b ung th. Chng ta s b tr s liu trn bng R nh sau:

ao <- c(1, 0)
ntotal <- c(28, 163)
cancer <- c(11, 36)
proportion <- cancer/ntotal


Ch thch:

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
8
Dng 1 nh ngha bin ao c hai gi tr 1 v 0 (ch du <- c ngha
tng ng nh du bng =);

Dng 2 nh ngha bin ntotals, v cho bit ao=1 c 28 i tng,
ao=0 c 163 i tng;

Dng 3 nh ngha bin cancer, v cho bit ao=1 c 11 i tng,
ao=0 c 36 i tng;

Dng 4 nh ngha bin proportion bng cancer chia cho
ntotals, c ngha l t l ung th cho tng nhm ao.


Sau khi nhp s liu, chng ta sn sng phn tch. Trong R c hm glm
chuyn dng cho phn tch hi qui logistic. Cch vit hm ny c m t trong sch
ca ti. y, ti ch gii thch ngn gn nh sau:

logistic <- glm(proportion ~ ao, family=binomial,
weight=ntotal)

Trong lnh trn, chng ta yu cu R s dng hm glm m t proportion
nh l mt hm s ca ao (ch du ~ c ngha l m hnh), v phn phi ca
proportion l phn phi nh phn (binomial) v ch c 2 gi tr. Ngoi ra, trong
lnh trn, chng ta cn cho mt thng s weight=ntotal. Thng s weight yu
cu R s dng ntotal l mt s tm lc (thay v mt bnh nhn).

Kt qu phn tch c lu tr i tng c tn l logistic (tt nhin, chng
ta c th thay i vi mt tn no khc m mnh thch). By gi, chng ta c th xem
qua kt qu phn tch bng cch lnh summary i tng logistic nh sau:

summary(logistic)


Call:
glm(formula = proportion ~ ao, family = "binomial", weights = ntotal)

Deviance Residuals:
[1] 0 0

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
9
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2607 0.1888 -6.677 2.44e-11 ***
ao 0.8254 0.4306 1.917 0.0552 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 3.5022e+00 on 1 degrees of freedom
Residual deviance: -2.3093e-14 on 0 degrees of freedom
AIC: 12.933

Number of Fisher Scoring iterations: 3


Bng 2. Kt qu phn tch hi qui logistic bng R.


Ch thch: Lnh summary(logistic) cung cp cho chng ta cc kt qu phn tch nh
trnh by trong Hnh 1 trn.

(a) Phn Call: bo cho chng ta bit m hnh phn tch;

(b) Deviance: phn th hai ca kt qu cho bit qua v deviance, tc phn d (hay
residual trong m hnh [1]).

Deviance Residuals:
[1] 0 0

Deviance nh gii thch trn phn nh khc bit gia m hnh v d liu (cng tng
t nh mean square residual trong phn tch hi qui tuyn tnh vy). i vi mt m
hnh n l nh v d ny th gi tr ca deviance khng c ngha g nhiu.

(c) Phn k tip cung cp c s ca (m R t tn l intercept) v (ao) v sai
s chun (standard error) cho tng c s:

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2607 0.1888 -6.677 2.44e-11 ***
ao 0.8254 0.4306 1.917 0.0552 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
10

Qua kt qu ny, chng ta c = -1.2607 v

= -0.8254. c s

l s dng, cho
thy mi lin h gia cancer v ao l mi lin h thun: nguy c ung th tng khi gi
tr ca ao tng. Tuy nhin, kim nh z (tnh bng cch ly c s chia cho sai s
chun) cho chng ta thy nh hng ca ao khng hn c ngha thng k, v tr s p =
0.055.

Xin nhc li, OR chnh l e
0.8254
= 2.28 (tc phng trnh [5]) m chng ta va c c
qua phn tch th cng trong phn trn. Ni cch khc, khi ao=1 th nguy c ung th
tng 2.28 ln so vi nhm ao=0.

(d) Cc phn k tip cung cp mt s ch s thng k v m hnh, nhng khng c lin
quan n vn chng ta quan tm, nn ti s khng gii thch y.

Nh trnh by trn, khng c khc bit no gia kt qu phn tch bng R v kt
qu qua phn tch th cng. Tuy nhin, li th khi phn tch bng my tnh l thi gian.
Sau khi nhp d liu, tt c cc tnh ton bng R qua lnh trn tn khng y 1 giy!
Ngoi ra, R cn cung cp cho chng ta cc sai s chun thng rt kh tnh trong trng
hp phn tch a bin (m ti s bn qua trong mt bi sau).

IV. Phn tch hi qui logistic vi mt bin lin tc

Trong v d 1, c hai bin ph thuc (ung th) v bin c lp (phi nhim AO)
u l bin nh phn. Do , vic tnh ton cng n gin. Nhng trong nhiu nghin
cu, bin c lp (hay yu t nguy c) l bin lin tc, v vic tm hiu mi tng quan
gia hai bin c phn phc tp hn. Trong phn ny, ti s bn qua mt trng hp nh
th v s s dng R gii quyt vn .

V d 2. Nghin cu mi tng quan gia fibrinogen v EST. Erythrocyte
sedimentation rate (ESR) l t sut m cc hng huyt cu (erythrocytes) ng li trong
huyt thanh. Bnh nhn vi ESR cao hn 20 mm/gi c nguy c cao b bnh thp khp,
v cc bnh vim mn tnh; v bnh nhn vi ESR thp hn 20 c xem l bnh
thng. Khi ESR tng, mt s protein trong mu cng gia tng. Mt trong nhng
protein l fibrinogen. Mt nghin cu o lng ESR v fibrinogen 29 i tng
(Collett D, Jemain AA. Residuals, outliers and influential observations in regresison
analysis. Sains Malaysias 1985; 4:493-511) , v cc nh nghin cu pht hin trong
nhm ny c 6 i tng vi ESR cao hn 20 mm/gi. Cc nh nghin cu mun bit c
Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
11
mi tng quan no gia fibrinogen v ESR hay khng. S liu ca 29 i tng c
trnh by trong Bng s 3 sau y:


Bng 3. Fibrinogen v ESR 29 i tng

id fibrinogen ESR
1 2.52 0
2 2.56 0
3 2.19 0
4 2.18 0
5 3.41 0
6 2.46 0
7 3.22 0
8 2.21 0
9 3.15 0
10 2.60 0
11 2.29 0
12 2.35 0
16 3.15 0
18 2.68 0
19 2.60 0
20 2.23 0
21 2.88 0
22 2.65 0
24 2.28 0
25 2.67 0
26 2.29 0
27 2.15 0
28 2.54 0
30 3.34 0
31 2.99 0
32 3.32 0
13 5.06 1
14 3.34 1
15 2.38 1
17 3.53 1
23 2.09 1
29 3.93 1
Ghi ch: id l m s ca i tng nghin cu;
esr c m ha 0 (nu ESR thp hn 20) hay 1
(nu ESR cao hn 20).

Gi p l xc sut esr=1 v x l lng protein fibrinogen trong mu, m hnh hi
qui logistic [1] c th ng dng tr li cu hi trn:

log
1
p
x
p

| |
= + +
|

\
[6]

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
12
Ch rng y, x l mt bin lin tc, ch khng phi bin nh phn. V th
phng php c tnh thng s v cng khc vi v d 1. Phng php chnh c
tnh thng s trong m hnh [6] l phng php maximum likelihood tc phng php
Hp l cc i, v khng nm trong phm vi ca bi vit ny, nn ti s khng trnh by
y (bn c c th tham kho sch gio khoa bit thm, nu cn thit). Tuy nhin,
ti mun cp ngn gn l phng php hp l cc i cung cp cho chng ta mt h
phng trnh nh sau:

( )
( )
( )
( )
1

1 1

1 1
1
1
i
i
n n
x
i
i i
n n
x
i i i
i i
y e
x y x e

+
= =
+
= =

= +

= +





Trong , Trong , y
i
l bin ph thuc (esr vi gi tr 0 hay 1), v x
i
l bin c lp
(fibrinogen), v n l s mu. tm c s v

(c s ca v , mt trong nhng
php tnh hay s dng l iterative weighted least square hay Newton-Raphson. R s
dng php tnh Newton-Raphson tm hai c s .

Trc khi phn tch, chng ta cn phi nhp s liu vo R nh sau (chng ta
khng cn nhp bin id):

fibrinogen <- c(2.52, 2.56, 2.19, 2.18, 3.41, 2.46, 3.22, 2.21, 3.15,
2.60, 2.29, 2.35, 3.15, 2.68, 2.60, 2.23, 2.88, 2.65,
2.28, 2.67, 2.29, 2.15, 2.54, 3.34, 2.99, 3.32,
5.06, 3.34, 2.38, 3.53, 2.09, 3.93)

esr <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1)

data <- data.frame(fibrinogen, esr)

Ch lnh th ba yu cu R nhp hai bin fibrinogen v esr vo mt d liu c tn
l data tin cho vic phn tch sau ny.

boxplot(fibrinogen ~ esr, xlab="ESR", ylab="Fibrinogen")
t.test(fibrinogen ~ esr)

Lnh th nht yu cu R v biu hnh hp (box plot) v fibrinogen phn nhm theo
bin esr, v kt qu c trnh by trong biu 2 di y. Lnh th hai s dng
kim nh t.test trong R xem s khc bit v fibrinogen gia hai nhm ESR c
ngha thng k hay khng, v kt qu c trnh by trong Bng 3 di y:
Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
13

0 1
2
.
0
2
.
5
3
.
0
3
.
5
4
.
0
4
.
5
5
.
0
ESR
F
i
b
r
i
n
o
g
e
n

Welch Two Sample t-test

data: fibrinogen by esr
t = -1.6498, df = 5.331,
p-value = 0.1563

alternative hypothesis: true

difference in means is not equal to 0
95 percent confidence interval:
-1.8666562 0.3907588

sample estimates:
mean in group 0 mean in group 1
2.650385 3.388333
Biu 2. Biu hnh hp phn phi ca
fibrinogen gia hai nhm ESR.
Bng 4. Kim nh t gia hai nhm cao v thp
ESR.

Phn tch n gin trn y cho thy fibrinogen trung bnh i tng c ESR cao
(tc esr = 1) l 3.39 mm/gi, c phn cao hn so vi nhm ESR thp vi
fibrinogen trung bnh l 2.65 mm/gi. Nhng s khc bit ny khng c ngha thng
k (p = 0.1563).

By gi chng ta phn tch bng phng php hi qui logistic vi hm glm trong R nh
sau:

logit.esr <- glm(esr ~ fibrinogen, family="binomial")
summary(logit.esr)

Ch cch vit lnh cng khng khc g so vi v d 1, Kt qu ca phn tch ny c
trnh by trong biu 3 sau y:

Call:
glm(formula = esr ~ fibrinogen, family = "binomial")

Deviance Residuals:
Min 1Q Median 3Q Max
-0.9298 -0.5399 -0.4382 -0.3356 2.4794

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.8451 2.7703 -2.471 0.0135 *
fibrinogen 1.8271 0.9009 2.028 0.0425 *
Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
14
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 30.885 on 31 degrees of freedom
Residual deviance: 24.840 on 30 degrees of freedom
AIC: 28.840

Number of Fisher Scoring iterations: 5

Bng 5. Kt qu phn tch s tng quan gia fibrinogen v ESR

Vi kt qu trn, phng trnh [6] c th vit nh sau:

log 6.8451 1.8271
1
p
x
p
| |
= +
|

\


Nh vy OR lin quan n fibrinogen l: OR = e
1.827
= 6.21 (nh gii thch phng
trnh [5]). Ni cch khc, khi fibrinogen tng 1 mmol/L, th odd vi esr cao tng 6.21
ln. Chng ta c th tnh khong tin cy 95% ca OR bng lnh sau y:

exp(confint(logit.esr, parm="fibrinogen"))

Ch : lnh trn yu cu tnh s m (exp) ca khong tin cy 95% (confint vit tt
t confidence interval ca thng s fibrinogen (parm vit tt ca ch parameter)
trong i tng phn tch logit.esr. Kt qu l:

2.5 % 97.5 %
1.403468 54.535954

Tc khong tin cy 95% ca OR lin quan n fibrinogen dao ng t 1.40 n 54.5.
Bi v khong tin cy 95% cao hn 1, chng ta c bng chng pht biu rng mi lin
h gia fibrinogen v ESR c ngha thng k. Tht ra, tr s p ca mi lin h ny l
0.0425 (xem Bng 5).

V. nh hng tng tc (interaction effect)

Hai v d trn, ti gii thiu qua cch phn tch hi qui logistic cc nghin cu
m bin c lp c th l bin lin tc hay bin khng lin tc, nhng m hnh ch n
Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
15
gin gii hn mt bin c lp. Tuy nhin, trong nhiu nghin cu khoa hc, c rt nhiu
bin c lp m nh nghin cu mun thm nh mi tng quan hay nh hng n mt
bin ph thuc. Trong phn ny, ti s bn v mt nghin cu vi hai bin c lp, v
vn tng tc gia cc bin c lp.

V d 3. Nghin cu v vai tr ca ph n trong x hi. Trong mt iu tra x
hi thc hin vo nm 1971-1972, cc nh nghin cu hi i tng nam v n ng
hay khng ng vi cu hi sau y: Ph n nn lo vic nh v vic iu hnh
nh nc cho n ng (Harberman SJ. The analysis of residuals in cross-classified
tables. Biometrics 1973;29:205-220). Cc nh nghin cu ghi nhn trnh hc vn v
gii ca mi i tng. Kt qu nghin cu c th tm lc bng Bng s liu s 6 sau
y.

Bng 6. Vai tr ca ph n trong x hi

edu sex agree disagree
0 Male 4 2
1 Male 2 0
2 Male 4 0
3 Male 6 3
4 Male 5 5
5 Male 13 7
6 Male 25 9
7 Male 27 15
8 Male 75 49
9 Male 29 29
10 Male 32 45
11 Male 36 59
12 Male 115 245
13 Male 31 70
14 Male 28 79
15 Male 9 23
16 Male 15 110
17 Male 3 29
18 Male 1 28
19 Male 2 13
20 Male 3 20
0 Female 4 2
1 Female 1 0
2 Female 0 0
3 Female 6 1
4 Female 10 0
5 Female 14 7
6 Female 17 5
7 Female 26 16
8 Female 91 36
9 Female 30 35
10 Female 55 67
11 Female 50 62
12 Female 190 403
Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
16
13 Female 17 92
14 Female 18 81
15 Female 7 34
16 Female 13 115
17 Female 3 28
18 Female 0 21
19 Female 1 2
20 Female 2 4
Ghi ch: Trong bng trn, bin edu l trnh hc vn (o bng s nm theo hc) ca
ngi tr li, agree v disagree l s i tng ng hay khng ng vi cu
hi. Chng hn nh trong dng cui ca bng s liu c ngha l trong s ph n vi 20
nm hc, 2 ngi ng v 4 ngi khng ng vi cu hi.

Cc nh nghin cu mun c lng s nh hng ca gii tnh v trnh hc
vn n xu hng tr li cu hi trn.

tin cho vic theo di, cc s liu trong bng trn trc ht s c nhp vo
R. Cc lnh sau y to ra 4 bin: edu, sex, agree v disagree. Ngoi ra, hai
bin ntotal (tng s i tng) v proportion (phn trm i tng ng vi cu
hi) cng c tnh ton t hai bin agree v disagree. Cc s liu ny s c lu
tr trong mt d liu c tn l women.

edu <- c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20)

sex <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1)

agree <- c(4, 2, 4, 6, 5, 13, 25, 27, 75, 29, 32,
36, 115, 31, 28, 9, 15, 3, 1, 2, 3, 4,
1, 0, 6, 10, 14, 17, 26, 91, 30, 55, 50,
190, 17, 18, 7, 13, 3, 0, 1, 2)

disagree <- c(2, 0, 0, 3, 5, 7, 9, 15, 49, 29, 45,
59, 245, 70, 79, 23, 110, 29, 28, 13, 20, 2,
0, 0, 1, 0, 7, 5, 16, 36, 35, 67, 62,
403, 92, 81, 34, 115, 28, 21, 2, 4)

ntotal <- agree + disagree
proportion <- agree/ntotal
women <- data.frame(edu, sex, agree, disagree, ntotal, proportion)

Trc khi phn tch, chng ta th tm hiu t l ng (tc bin proportion)
theo trnh hc vn v gii tnh, vi hai lnh sau y: Lnh th nht th hin s tng
Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
17
quan gia t l ng v trnh hc vn, v kt qu trnh by trong Biu 3; lnh th
hai v biu hnh hp v t l ng theo gii tnh (Biu 4):

plot(proportion ~ edu, ylab="% agreed", pch=ifelse(sex==0,16,21))
boxplot(proportion ~ sex)

0 5 10 15 20
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
edu
%

a
g
r
e
e
d

0 1
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0

Biu 3. T l i tng ng vi cu
hi theo trnh hc vn. Cc im trn en
th hin i tng n, v cc im trn trng
th hin i tng nam.
Biu 4. T l i tng ng vi
cu hi theo gii tnh. Trc honh 0 th
hin i tng n, 1 th hin i tng
nam.

Biu 3 cho thy r rng c mt mi tng quan nghch o gia t l ng v trnh
hc vn: i tng c trnh vn ha cng cao, t l ng cng thp. Tuy nhin, c
hai biu cho thy nh hng ca gii tnh c v khng quan trng, d t l n ng
c v cao hn so vi nam gii.

Gi p l xc sut ng vi cu hi, v vi kt qu phn tch s b trn, chng ta
c th xem xt mt m hnh n gin m theo t xc sut ng ty thuc vo trnh
hc vn v gii tnh. Ni theo ngn ng ca m hnh hi qui logistic:

log
1
p
edu sex
p

| |
= + +
|

\
[7]

V, theo ngn ng my tnh R (kt qu trnh by trong Bng 7):

logistic <- glm(proportion ~ sex + edu, family=binomial,
weight=ntotal)

summary(logistic)
Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
18

Call:
glm(formula = proportion ~ sex + edu, family = "binomial", weights =
ntotal)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.72544 -0.87168 -0.08448 0.88843 3.13315

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.50937 0.18389 13.646 <2e-16 ***
sex -0.01145 0.08415 -0.136 0.892
edu -0.27062 0.01541 -17.560 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 451.722 on 40 degrees of freedom
Residual deviance: 64.007 on 38 degrees of freedom
AIC: 208.07

Number of Fisher Scoring iterations: 4

Bng 7. Kt qu phn tch hi qui logistic ca m hnh [7]

Kt qu trn cho thy r rng nh hng ca trnh hc vn n xu hng ng vi
cu hi (p < 0.0001), nhng gii tnh khng c nh hng ng k (p = 0.892).

M hnh [7] cn c tn l m hnh cng hng (additive model hay main effect
model), bi v m hnh ny pht biu rng trnh hc vn v gii tnh nh hng c
lp n t l ng . Cm t c lp y c ngha l nh hng ca trnh hc vn
hon ton khng ty thuc vo nh hng ca gii tnh (v ngc li, nh hng ca gii
tnh nu c hon ton khng ph thuc vo trnh hc vn).

Trong thc t, l mt m hnh n gin, bi v thi v hnh x ca nam v
n c th khc nhau d h c cng mt trnh hc vn. Nu iu xy ra, th m hnh
cng hng [7] khng cn ph hp trong thc t na. V th, trc khi chp nhn m
hnh cng hng, chng ta phi xem xt n m hnh tng tc (interaction model) gia
gii tnh v trnh hc vn. M hnh tng tc pht biu rng:

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
19
log
1
p
edu sex edu sex
p

| |
= + + +
|

\
[8]

Vi R, m hnh trn c vit nh sau:

interaction <- glm(proportion ~ sex + edu + sex:edu,
family=binomial, weight=ntotal)

summary(interaction)

Call:
glm(formula = proportion ~ sex * edu, family = "binomial", weights =
ntotal)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.39097 -0.94911 0.03065 0.75927 2.45262

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.09820 0.23550 8.910 < 2e-16 ***
sex 0.90474 0.36007 2.513 0.01198 *
edu -0.23403 0.02019 -11.592 < 2e-16 ***
sex:edu -0.08138 0.03109 -2.617 0.00886 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 451.722 on 40 degrees of freedom
Residual deviance: 57.103 on 37 degrees of freedom
AIC: 203.16

Number of Fisher Scoring iterations: 4

Bng 8. Kt qu phn tch hi qui logistic ca m hnh tng tc [8]

Kt qu trn cho chng ta mt bc tranh hon ton khc vi m hnh cng hng: tt
c ba thng s sex, edu v tng tc sex:edu (du : c ngha l tng tc trong R)
u c ngha thng k. hiu m hnh ny, chng ta cn phi vit li m hnh [8]
bng cc c s trong Bng 8:

log 2.098 0.905 0.234 0.081
1
p
sex edu edu sex
p
| |
= +
|

\

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
20

Phng trnh cho n (tc sex = 0) l:

log 2.098 0.234
1
p
edu
p
| |
=
|

\


Phng trnh cho nam (tc sex = 1) l:

log 2.098 0.905 0.234 0.081
1
p
edu edu
p
| |
= +
|

\

3.003 0.315 edu =

Ni cch khc, chng ta c 2 phng trnh vi cho hai gii tnh. n, mi nm tng v
hc vn, OR = e
-0.234
= 0.79, nhng nam, OR = e
-0.315
= 0.73.

Mt cch khc cm nhn s khc bit gia hai nhm l qua biu (xem m v hai
biu ny trong phn Ch thch). Hai biu sau y m t t l ng v trnh
hc vn cho nam v n da vo m hnh cng hng [7] v m hnh tng tc [8]:

0 5 10 15 20
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Education
%

a
g
r
e
e
d
Fitted (Men)
Fitted (Women)
*
* *
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
* *
o
o
o
o
o
o
o
o
o
o o
o
o
o o
o o
o
o o

0 5 10 15 20
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Education
%

a
g
r
e
e
d
Fitted (Men)
Fitted (Women)
*
* *
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
* *
o
o
o
o
o
o
o
o
o
o o
o
o
o o
o o
o
o o

Biu 5. Tin on t l i tng ng
vi cu hi theo gii tnh da vo m hnh
cng hng: ng khng t on th hin
nam, v ng t on th hin n.
Cc im o th hin i tng n, v cc
im * th hin i tng nam.
Biu 6. Tin on t l i tng ng
vi cu hi theo gii tnh da vo m hnh
tng tc: ng khng t on th hin
nam, v ng t on th hin n.
Cc im o th hin i tng n, v cc
im * th hin i tng nam.

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
21
Biu 5 cho thy hai ng biu din cho nam v n hu nh trng nhau,
nhng Biu 6 cho thy xc sut ng vi cu hi khc nhau gia nam v n v
khc bit cn ty thuc vo trnh hc vn. Chng hn nh nhng i tng c trnh
hc vn thp hn 10 nm, n c xu hng ng cao hn nam; nhng nhng i
tng c trnh hc vn cao hn 10 nm, nam c xu hng ng hn n. Trong bi
cnh ca cu hi, nhng i tng vi trnh hc vn thp thng ng vi quan
im rng ph n nn lo vic nh v vic quc gia i s cho nam iu hnh, nhng
vi nhng i tng c trnh hc vn cao, phn ln u khng ng vi quan im
ny, v phn ng ca n khc vi nam ty vo trnh hc vn. chnh l ngha ca
nh hng tng tc!

Qua v d trn, chng ta thy nu phn tch s liu theo thi quen m khng xem
xt n kh nng nh hng tng tc, rt d i n kt lun sai hay b qua nhng thng
tin quan trng. Xy dng m hnh trong phn tch thng k v khoa hc ni chung l mt
vn phc tp, v ti s bn n trong phn sau.

Trong bi sau (hi vng l c th gi) ti s bn qua v m hnh hi qui logistic a
bin, nh hng phi tuyn tnh (non-linear effect) v cc phng php cng tiu chun
xy mt m hnh logistic hon chnh.

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
22
Ch thch:

Thut ng s dng trong bi

Ting Anh Ting Vit
Logistic regression model M hnh hi qui logistic
Control i chng
Variable Bin
Continuous variable Bin lin tc
Discrete variable Bin khng lin tc hay bin ri rc
Dependent variable Bin ph thuc
Independent variable Bin c lp
Maximum likelihood method Phng php hp l cc i
Additive model M hnh cng hng
Interaction model M hnh tng tc


M R v biu 5 v 6

# to mt hm v, gi hm l myplot

myplot <- function(predicted)
{
f <- data$sex == 1
plot(data$edu, predicted, type="n",
ylab="% agreed", xlab="Education", ylim=c(0,1))
lines(data$edu[!f], predicted [!f], lty=1)
lines(data$edu[f], predicted[f], lty=2)
lgtxt <- c("Fitted (Men)", "Fitted (Women)")
legend("topright", lgtxt, lty=1:2, bty="n")
y <- data$agree/data$ntotal
# text(data$edu, y, ifelse(f, "", ""), cex=1.25)
text(data$edu, y, ifelse(f, "o", "*"), cex=1.25)
}

# v biu 5 m hnh cng hng - additive model
additive <- glm(proportion ~ sex+edu,
family=binomial, weight=ntotal, data=data)
p.additive <- predict(additive, type="response")
myplot(p.additive)

# v biu 6 6 m hnh tng tc - interactive model
Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
23
interaction <- glm(proportion ~ sex+edu,
family=binomial, weight=ntotal, data=data)
p.predicted <- predict(interaction, type="response")
myplot(p.predicted)

You might also like