You are on page 1of 56

Mata kuliah : A0392 - Statistik Ekonomi

Tahun : 2010

Pertemuan 13
Data Deret Waktu dan Analisis
Regresi dan Korelasi Linier
Sederhana

1
Outline Materi :

 Data Deret Waktu (Times Series)


 Analisis Regresi Linier Sederhana
 Koefisien Korelasi dan Uji
Ketergantungan antar Peubah Acak

2
PENDAHULUAN
• Data deret berkala adalah sekumpulan data yang dicatat
dalam suatu periode tertentu.

• Manfaat analisis data berkala adalah mengetahui kondisi


masa mendatang.

• Peramalan kondisi mendatang bermanfaat untuk


perencanaan produksi, pemasaran, keuangan dan
bidang lainnya.

KOMPONEN DATA BERKALA

Trend; Variasi Musim; Variasi Siklus; dan Variasi yang


Tidak Tetap (Irregular)
3
3
TREND

Suatu gerakan kecenderungan naik atau turun dalam jangka panjang


yang diperoleh dari rata-rata perubahan dari waktu ke waktu dan
nilainya cukup rata (smooth).

Y Y

Tahun (X) Tahun (X)

Trend Positif Trend Negatif


4
4
Metode Kuadrat Terkecil Untuk Trend Linier
Menentukan garis trend yang mempunyai jumlah terkecil dari
kuadrat selisih data asli dengan data pada garis trendnya.

Trend Pelanggan PT. Telkom

8
Pelanggan (Jutaan)

7
Y = a + bX 6
5
4
a = Y/N 3
2
1
b = YX/X2 0
97 98 99 00 01
Tahun

Data Y' Data Y

5
5
CONTOH METODE KUADRAT TERKECIL

Tahun Pelanggan Kode X Y.X X2


=Y (tahun)
1997 5,0 -2 -10,0 4
1998 5,6 -1 -5,6 1
1999 6,1 0 0 0
2000 6,7 1 6,7 1
2001 7,2 2 14,4 4
Y=30,6 Y.X=5,5 X2=10

Nilai a = 30,6/5=6,12
Nilai b =5,5/10=0,55
Jadi persamaan trend Y’=6,12+0,55x
6
6
ANALISIS TREND KUADRATIS

Untuk jangka waktu pendek, Trend Kuadratis

kemungkinan trend tidak

Jumlah Pelanggan
8.00
bersifat linear. Metode

(jutaan)
6.00
kuadratis adalah contoh 4.00 Y=a+bX+c
X2

metode nonlinear 2.00


0.00
97 98 99 00 01

Tahun
Y = a + bX + cX2

Koefisien a, b, dan c dicari dengan rumus sebagai berikut:


a = (Y) (X4) – (X2Y) (X2)/ n (X4) - (X2)2
b = XY/X2
c = n(X2Y) – (X2 ) ( Y)/ n (X4) - (X2)2
7
7
CONTOH TREND KUADRATIS

Tahun Y X XY X2 X2Y X4
1997 5,0 -2 -10,00 4,00 20,00 16,00
1998 5,6 -1 -5,60 1,00 5,60 1,00
1999 6,1 0 0,00 0,00 0,00 0,00
2000 6,7 1 6,70 1,00 6,70 1,00
2001 7,2 2 14,40 4,00 2880 16,00

30.60 5,50 10,00 61,10 34,00


a = (Y) (X4) – (X2Y) (X2) = {(30,6)(34)-(61,1)(10)}/{(5)(34)-(10)2}=6,13
n (X4) - (X2)2
b = XY/X2 = 5,5/10=0,55
c = n(X Y) – (X ) ( Y)
2 2 = {(5)(61,1)-(10)(30,6)}/{(5)(34)-(10)2}=-0,0071
n (X4) - (X2)2
Jadi persamaan kuadratisnya adalah Y =6,13+0,55x-0,0071x2

8
8
ANALISIS TREND EKSPONENSIAL

Persamaan eksponensial dinyatakan dalam bentuk variabel waktu (X)


dinyatakan sebagai pangkat. Untuk mencari nilai a, dan b dari data Y
dan X, digunakan rumus sebagai berikut:

Y’ = a (1 + b)X
Trend Eskponensial

Ln Y’ = Ln a + X Ln (1+b) 15,00

Pelanggan
Sehingga a = anti ln (LnY)/n

(jutaan)
Jumlah
10,00
b = anti ln  (X. LnY) - 1 5,00
(X)2 0,00
97 98 99 00 01

Tahun

Y= a(1+b)X 9
9
CONTOH TREND EKSPONENSIAL

Tahun Y X Ln Y X2 X Ln Y

1997 5,0 -2 1,6 4,00 -3,2


1998 5,6 -1 1,7 1,00 -1,7
1999 6,1 0 1,8 0,00 0,0
2000 6,7 1 1,9 1,00 1,9
2001 7,2 2 2,0 4,00 3,9
9,0 10,00 0,9
Nilai a dan b didapat dengan:
a = anti ln (LnY)/n = anti ln 9/5=6,049
b = anti ln  (X. LnY) - 1 = {anti ln0,9/10}-1=0,094
(X)2
Sehingga persamaan eksponensial Y =6,049(1+0,094)x 10
10
VARIASI MUSIM

Variasi musim terkait dengan perubahan atau fluktuasi dalam musim-


musim atau bulan tertentu dalam 1 tahun.

Pergerakan Inflasi 2002 Indeks Saham PT. Astra Agro


Produksi Padi Permusim
Lestari, Maret 2003
2,5
30
Produksi (000 ton)

150
2
20 100
Inflasi (%)

1,5

Indeks
10 1 50
0 0,5 0
I- II- III- I- II- III- I- II- III- I- II- III-
98 98 98 99 99 99 00 00 00 01 01 03
0 03 05 13 14 22
1 2 3 4 5 6 7 8 9 10 11 12
Triw ulan Tanggal
Bulan

Variasi Musim Produk Variasi Inflasi Bulanan Variasi Harga Saham


Pertanian Harian
11
11
VARIASI MUSIM DENGAN METODE RATA-RATA
SEDERHANA

Indeks Musim = (Rata-rata per kuartal/rata-rata total) x 100


Bulan Pendapatan Rumus= Nilai bulan ini x 100 Indeks
Nilai rata-rata Musim
Januari 88 (88/95) x100 93
Februari 82 (82/95) x100 86
Maret 106 (106/95) x100 112
April 98 (98/95) x100 103
Mei 112 (112/95) x100 118
Juni 92 (92/95) x100 97
Juli 102 (102/95) x100 107
Agustus 96 (96/95) x100 101
September 105 (105/95) x100 111
Oktober 85 (85/95) x100 89
November 102 (102/95) x100 107
Desember 76 (76/95) x100 80
Rata-rata 95
12
12
METODE RATA-RATA DENGAN TREND

• Metode rata-rata dengan trend dilakukan dengan cara yaitu indeks


musim diperoleh dari perbandingan antara nilai data asli dibagi
dengan nilai trend.

• Oleh sebab itu nilai trend Y’ harus diketahui dengan persamaan


Y’ = a + bX.

13
13
METODE RATA-RATA DENGAN TREND

Bulan Y Y’ Perhitungan Indeks Musim


Januari 88 97,41 (88/97,41) x 100 90,3
Februari 82 97,09 (82/97,09) x 100 84,5
Maret 106 96,77 (106/96,77) x100 109,5
April 98 96,13 (98/96,13) x 100 101,9
Mei 112 95,81 (112/95,81) x 100 116,9
Juni 92 95,49 (92/95,49) x 100 96,3
Juli 102 95,17 (102/95,17) x 100 107,2
Agustus 96 94,85 (96/94,85) x 100 101,2
September 105 94,53 (105/94,53) x 100 111,1
Oktober 85 93,89 (85/93,89) x 100 90,5
November 102 93,57 (102/93,57) x 100 109,0
Desember 76 93,25 (76/93,25) x 100 81,5

14
14
VARIASI SIKLUS

Siklus Indeks Saham Gabungan


Siklus
2,5
Ingat 2
1,5
Y=TxSxCxI 1
0,5

IHSG
Maka 0
-0,5 94 95 96 97 98 99 00 01 02
TCI = Y/S -1
CI = TCI/T -1,5
-2
Di mana CI adalah Indeks -2,5
Siklus Tahun

15
15
CONTOH SIKLUS

Th Trwl Y T S TCI=Y/S CI=TCI/T C


I 22 17,5

1998 II 14 17,2 95 14,7 86

III 8 16,8 51 15,7 93 92

I 25 16,5 156 16,0 97 97

1999 II 15 16,1 94 16,0 99 100

III 8 15,8 49 16,3 103 102

I 26 15,4 163 16,0 104 104

2000 II 14 15,1 88 15,9 105 105

III 8 14,7 52 15,4 105 106

I 24 14,3 157 15,3 107 108

2001 II 14 14,0 89 15,7 112

III 9 13,6
16
16
GERAK TAK BERATURAN
Siklus
Ingat Y = T x S x C x I
TCI = Y/S
CI = TCI/T
I = CI/C

17
17
GERAK TAK BERATURAN

Th Trwl CI=TCI/T C I=(CI/C) x 100


I 86

1998 II 93 92 101

III 97 97 100

I 99 100 99

1999 II 103 102 101

III 104 104 100

I 105 105 100

2000 II 105 106 99

III 107 108 99

I 112

2001 II
III
18
18
PENGUJIAN KOEFISIEN
REGRESI DENGAN
ANALISIS VARIANSI

19
Measures of Variation:
The Sum of Squares

SST = SSR + SSE

Total = Explained + Unexplained


Sample Variability Variability
Variability

SST = Total Sum of Squares


SSR = Regression Sum of Squares
SSE = Error Sum of Squares 20
Measures of Variation:
The Sum of Squares

Y 
SSE =(Yi - Yi )2
_
SST = (Yi - Y)2

 _
SSR = (Yi - Y)2
_
Y

X
Xi
21
Venn Diagrams and
Explanatory Power of
Regression
Variations in Sales
Variations in explained by the
store Sizes not error term or
used in unexplained by
explaining Sales Sizes  SSE 
variation in
Sales Variations in Sales
explained by Sizes
Sizes or variations in Sizes
used in explaining
variation in Sales
 SSR  22
The ANOVA Table in Excel

ANOVA
Significanc
df SS MS F e
F
Regressio SS MSR P-value of
k MSR/MSE
n R =SSR/k the F Test
MSE
n-k- SS
Residuals =SSE/(n-k-
1 E
1)
SS
Total n-1
T
23
Measures of Variation
The Sum of Squares: Example

Excel Output for Produce Stores


Degrees of freedom

ANOVA
df SS MS F Significance F
Regression 1 30380456.12 30380456 81.17909 0.000281201
Residual 5 1871199.595 374239.92
Total 6 32251655.71

Regression (explained) df SST


SSE
Error (residual) df
Total df SSR 24
Venn Diagrams and
Explanatory Power of
Regression

r 
2
Sales

Sizes
SSR

SSR  SSE
25
Standard Error of Estimate

 
n
• 2
Y  Yˆi
SSE
SYX   i 1

n2 n2

• Measures the standard deviation


(variation) of the Y values around the
regression equation

26
Measures of Variation:
Produce Store Example

Excel Output for Produce Stores


R e g r e ssi o n S ta ti sti c s
M u lt ip le R 0 .9 7 0 5 5 7 2
R S q u a re 0 .9 4 1 9 8 1 2 9
A d ju s t e d R S q u a re 0 .9 3 0 3 7 7 5 4
S t a n d a rd E rro r 6 1 1 .7 5 1 5 1 7
O b s e r va t i o n s 7
n Syx
r2 = .94
94% of the variation in annual sales can be
explained by the variability in the size of the
store as measured by square footage. 27
Linear Regression
Assumptions

• Normality
– Y values are normally distributed for each X
– Probability distribution of error is normal
• Homoscedasticity (Constant Variance)
• Independence of Errors

28
Consequences of Violation
of the Assumptions
• Violation of the Assumptions
– Non-normality (error not normally distributed)
– Heteroscedasticity (variance not constant)
• Usually happens in cross-sectional data
– Autocorrelation (errors are not independent)
• Usually happens in time-series data
• Consequences of Any Violation of the Assumptions
– Predictions and estimations obtained from the
sample regression line will not be accurate
– Hypothesis testing results will not be reliable
• It is Important to Verify the Assumptions
29
Variation of Errors Around
the Regression Line

• Y values are normally distributed


f(e) around the regression line.
• For each X value, the “spread” or
variance around the regression line is
the same.

Y
X2
X1
X Sample Regression Line 30
Inference about the Slope:
t Test
• t Test for a Population Slope
– Is there a linear dependency of Y on X ?
• Null and Alternative Hypotheses
– H0: 1 = 0 (no linear dependency)
– H1: 1  0 (linear dependency)
• Test Statistic
– b1  1 SYX
t where Sb1 
Sb1 n

(X
i 1
i  X) 2

– d. f .  n  2 31
Example: Produce Store

Data for 7 Stores:


Estimated Regression
Annual Equation:
Store Square Sales
Feet ($000)
1 1,726 3,681 Yˆi  1636.415  1.487X i
2 1,542 3,395
3 2,816 6,653 The slope of this
4 5,555 9,543 model is 1.487.
5 1,292 3,318
Does square footage
6 2,208 5,563
affect annual sales?
7 1,313 3,760
32
Inferences about the Slope:
t Test Example
Test Statistic:
H0: 1 = 0
From Excel Printout b Sb1 t
H1: 1  0 1

  .05 Intercept
Coefficients Standard Error
1636.4147 451.4953
t Stat P-value
3.6244 0.01515
df  7 - 2 = 5 Footage 1.4866 0.1650 9.0099 0.00028
Critical Value(s): Decision:
Reject Reject Reject H0. p-value
.025 .025 Conclusion:
There is evidence that
-2.5706 0 2.5706 t square footage affects
annual sales. 33
Inferences about the Slope:
Confidence Interval Example

Confidence Interval Estimate of the Slope:


b1  tn  2 Sb1
Excel Printout for Produce Stores
Lower 95% Upper 95%
Intercept 475.810926 2797.01853
Footage 1.06249037 1.91077694
At 95% level of confidence, the confidence interval
for the slope is (1.062, 1.911). Does not include 0.
Conclusion: There is a significant linear dependency
of annual sales on the size of the store. 34
Inferences about the Slope:
F Test
• F Test for a Population Slope
– Is there a linear dependency of Y on X ?
• Null and Alternative Hypotheses
– H0: 1 = 0 (no linear dependency)
– H1: 1  0 (linear dependency)
• Test Statistic
SSR
– F  1
SSE
 n  2 35
Relationship between a t
Test and an F Test
• Null and Alternative Hypotheses
– H0: 1 = 0 (no linear dependency)
– H1: 1  0 (linear dependency)

t 
2
• n2  F1,n 2
• The p –value of a t Test and the p –value
of an F Test are Exactly the Same
• The Rejection Region of an F Test is
Always in the Upper Tail
36
Inferences about the Slope:
F Test Example

Test Statistic:
H0: 1 = 0 From Excel Printout
H1: 1  0 ANOVA
df SS MS F Significance F
  .05 Regression 1 30380456.12 30380456.12 81.179 0.000281
numerator Residual 5 1871199.595 374239.919
df = 1 Total 6 32251655.71 p-value
denominator
df  7 - 2 = 5 Decision: Reject H0.
Reject
Conclusion:
 = .05 There is evidence that
square footage affects
0 6.61 F1,n  2 annual sales. 37
Purpose of Correlation
Analysis

• Correlation Analysis is Used to Measure


Strength of Association (Linear
Relationship) Between 2 Numerical
Variables
– Only strength of the relationship is concerned
– No causal effect is implied

38
Purpose of Correlation
Analysis

• Population Correlation Coefficient  (Rho)


is Used to Measure the Strength between
the Variables
 XY

 X Y

39
Purpose of Correlation
Analysis
(continued)
• Sample Correlation Coefficient r is an
Estimate of  and is Used to Measure the
Strength of the Linear Relationship in the
Sample Observations

 X i  X Yi  Y 
r i 1
n n

 X X  Y  Y 
2 2
i i
i 1 i 1
40
Sample Observations from
Various r Values

Y Y Y

X X X
r = -1 r = -.6 r=0
Y Y

X X 41
r = .6 r=1
Features of  and r

• Unit Free
• Range between -1 and 1
• The Closer to -1, the Stronger the
Negative Linear Relationship
• The Closer to 1, the Stronger the Positive
Linear Relationship
• The Closer to 0, the Weaker the Linear
Relationship

42
t Test for Correlation
• Hypotheses
– H0:  = 0 (no correlation)
– H1:   0 (correlation)
• Test Statistic
r
t where
 r 2

– n2
n

 X i  X Yi  Y 
r  r2  i 1
n n

 X X  Y  Y 
2 2
i i
i 1 i 1 43
Example: Produce Stores

From Excel Printout r


R e g re ssi o n S ta ti sti c s
Is there any M u lt ip le R 0 .9 7 0 5 5 7 2
evidence of linear R S q u a re 0 .9 4 1 9 8 1 2 9
relationship between A d ju s t e d R S q u a re 0 . 9 3 0 3 7 7 5 4
annual sales of a S t a n d a rd E rro r 6 1 1 .7 5 1 5 1 7

store and its square O b s e rva t io n s 7

footage at .05 level H0:  = 0 (no


of significance? association)
H1:   0 (association)
  .05 44
Example: Produce Stores
Solution

Decision:
r .9706 Reject H0.
t   9.0099
 r 2
1  .9420
Conclusion:
n2 5
There is evidence of a
Critical Value(s): linear relationship at 5%
level of significance.
Reject Reject
The value of the t statistic is
.025 .025
exactly the same as the t
statistic value for test on the
-2.5706 0 2.5706 slope coefficient.
45
Estimation of Mean Values

Confidence Interval Estimate for Y | X  X i


:
The Mean of Y Given a Particular Xi
Size of interval varies according
Standard error to distance away from mean, X
of the estimate
1 (Xi  X ) 2
Yˆi  tn 2 SYX  n
(Xi  X )
n 2
t value from table
i 1
with df=n-2
46
Prediction of Individual Values

Prediction Interval for Individual Response


Yi at a Particular Xi

Addition of 1 increases width of interval


from that for the mean of Y

1 (Xi  X ) 2
Yˆi  tn2 SYX 1  n
(Xi  X )
n 2

i 1 47
Interval Estimates for Different
Values of X
Confidence
Prediction Interval Interval for the
for a Individual Yi Mean of Y
Y

X
X a given X 48
Example: Produce Stores

Data for 7 Stores:


Annual
Store Square Sales Consider a store
Feet ($000) with 2000 square
1 1,726 3,681 feet.
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543 Regression Model Obtained:
5 1,292 3,318 
6 2,208 5,563
Yi = 1636.415 +1.487Xi
7 1,313 3,760
49
Estimation of Mean Values:
Example

Confidence Interval Estimate for Y | X  X i

Find the 95% confidence interval for the average annual


sales for stores of 2,000 square feet.

Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)
X = 2350.29 SYX = 611.75 tn-2 = t5 = 2.5706

1 ( X i  X )2
Yˆi  tn 2 SYX  n  4610.45  612.66
(Xi  X )
n 2

i 1

3997.02  Y |X  X i  5222.34 50
Prediction Interval for Y :
Example

Prediction Interval for Individual YX  X i


Find the 95% prediction interval for annual sales of one
particular store of 2,000 square feet.

Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)

X = 2350.29 SYX = 611.75 tn-2 = t5 = 2.5706

1 ( X i  X )2
Yˆi  tn 2 SYX 1  n  4610.45  1687.68
 i
n
( X  X ) 2

i 1

2922.00  YX  X i  6297.37 51
PENGGUNAAN MS EXCEL UNTUK REGRESI

• Masukkan data Y dan data X pada sheet MS Excel, misalnya


data Y di kolom A dan X pada kolom B dari baris 1 sampai 5.
• Klik icon tools, pilih ‘data analysis’, dan pilih ‘simple linear
regression’.
• Pada kotak data tertulis Y variable cell range: masukkan data
Y dengan mem-blok kolom a atau a1:a5. Pada X variable cell
range: masukkan data X dengan mem-blok kolom b atau
b1:b5.
• Anda klik OK, maka hasilnya akan keluar. Y’= a+b X; a
dinyatakan sebagai intercept dan b sebagai X variable1 pada
kolom coefficients.

52
52
53
53
54
54
55
55
SELAMAT BELAJAR SEMOGA SUKSES SELALU

56
56

You might also like