You are on page 1of 14

NG DNG CNG NGH OLAP TRONG KHAI THC S LIU DCH HI TRN LA TI TR VINH K thut phn tch d liu

trc tuyn OLAP (Online Analytical Processing) c s dng rng ri trong nhiu ng dng kinh doanh phn tch d liu v gip truy vn trn d liu a chiu nhm h tr vic ra quyt nh ca cc nh qun l. Bi bo ny m t v xy dng mt ng dng thng k trc tuyn trn kho d liu dch hi trn la bng cch s dng kin trc kho d liu (Data Warehouse), k thut phn tch trc tuyn OLAP v cu trc web. ng dng ny h tr khai thc hiu qu kho d liu dch hi trn la c ngnh Trng trt v Bo v Thc vt Tr Vinh tch t trong nhiu nm qua. t c mc ch ny, mt vi cng c c s dng xy dng thnh cng ng dng nh: SQL Server 2005 cho kho d liu, Analysis Services 2005 cho CSDL a chiu OLAP v Microsoft Office Web Components cho cc bo co da trn web. Vi kt qu t c, ng dng cho php ngi s dng phn tch trc tuyn v linh hot d liu dch hi trn la trc tip trang web, iu ny cho thy rng cng ngh OLAP khng nhng l cng c c p dng hiu qu trong cc ng dng h tr kinh doanh m cn cn c p dng mnh m trong cc h thng tin hc h tr lnh vc nng nghip, nng thn, nng dn. 1. GII THIU Nng nghip l mt ngnh kinh t quan trng trong nn kinh t ca mi quc gia, c bit l trong giai on hin nay, khi m vn an ninh lng thc c t ln hng u. nc ta, do nn kinh t pht trin ln t mt nc nng nghip nn kinh t nng nghip cng c vai tr quan trng hn trong s pht trin kinh t ca t nc. iu ny c chng minh thc t trong cuc khng hong gi lng thc nm 2008, lm nh hng nghim trng n nhiu thnh phn kinh t khc v i sng ca hng triu ngi. ng bng Sng Cu Long (BSCL) l va la ln nht nc ta, trong nhng nm gn y, vi vic thm canh tng v v p dng cc tin b khoa hc k thut vo sn xut la lm cho nng sut v sn lng la ngy mt nng ln. Hn mi nm qua, BSCL c xem l vng kinh t trng im ca quc gia sn xut lng thc phc v xut khu v bo m an ninh lng thc quc gia. Tuy nhin, vi vic thm canh cy la trong nhiu nm lin to nguy c cho s bc pht dch hi trn la, c bit trong nm 2006 dch ry nu xut hin tt c cc tnh thuc vng BSCL. chng li s bc pht dch hi, nng cao nng sut v sn lng la. Cc nh khoa hc nghin cu v a ra nhiu bin php hu hiu nh: lai to ra cc ging la mi c kh nng khng dch hi, thc hin phng php qung canh, gieo trng n trnh dch hi, phun thuc phng tr dch hi v nhiu phng php khc. D l p dng phng php no i na th nhu cu tm kim thng tin v dch hi ca nh nng, nh qun l nng nghip v nh khoa hc cng ht sc cn thit. Ti Chi cc Trng trt v Bo v thc vt Tr Vinh, nhu cu phn tch s liu da trn khi lng d liu khng l h tr thng tin nhanh v dch hi trn la cho nng dn, cn

b qun l nng nghip, cn b nghin cu l mc tiu c t ra. Cc kt qu phn tch ny c ngha quan trng trong cng tc phng tr dch bnh. t c mc tiu ny, vic xy dng ng dng thng k trc tuyn s dng k thut phn tch trc tuyn (OLAP OnLine Analytical Processing) l mt gii php tt. Khi phn tch s liu ngi ta da vo cc ngun d liu th c ly t cc ngun khc nhau: phiu iu tra s liu, cc bo co t cc Chi cc Trng trt v Bo v thc vt cc tnh, s liu t cc d n nghin cu bnh la. D liu c lu tr nhiu dng nh: Excel, Access, Foxpro, SQL server. Cc phn mm hin ti ch cung cp cc biu mu bo co c nh c thit k v ci t sn. th hin cc bo co da trn nhiu ch tiu, theo yu cu ca ngi s dng, ngi ta thng mt rt nhiu thi gian cho vic thu thp d liu, tch hp s liu, thc hin mt s cng thc tnh ton, thit k mu bo co mi, thm ch sa i li cu trc c s d liu (CSDL) v thay i m ngun ca phn mm. Chnh cc bc phc tp ny lm chm i qu trnh cung cp thng tin nhanh p ng kp thi cho cng tc phng chng dch bnh. V vy, vic thay th bng mt phng php mi l cn thit. T yu cu thc t, bi ton xy dng ng dng thng k trc tuyn trn nn web phc v cng tc phn tch trc tuyn trn d liu v dch bnh trn la l tht s thit thc. ng dng c thit k sao cho c th cung cp thng tin chnh xc, p ng yu cu phn tch s liu trn nhiu ch tiu, theo yu cu ca ngi dng khi phn tch d liu. p ng cc yu cu ny th vic p dng cng ngh mi nh: kho d liu (Data Warehouse), k thut phn tch trc tuyn (OLAP) v cc cng ngh lin quan n web vo vic xy dng ng dng l gii php kh thi. Bi bo tin hnh m t v xy dng mt ng dng thng k s liu dch hi trn la ti Tr Vinh s dng kin trc kho d liu, k thut phn tch trc tuyn OLAP v kin trc web. Bi bo cng nghin cu cc tiu ch m ngi dng s dng khi phn tch d liu dch hi trn la trc tuyn trn web. 2. CNG NGH OLAP 2.1. OLAP l g? Thut ng OLAP c E. F. Codd a ra trong mt bi bo c tn Providing On-Line Analytical Processing to User Analysts c cng b vo thng 8 nm 1993 [8]. Trong bi bo ny ng cng a ra 12 quy tc m mt h thng OLAP phi tun theo. T OLAP c bit n nh mt k thut phn tch d liu s dng cc th hin d liu a chiu gi l cc khi (cube). OLAP cung cp kh nng to ra cc khi d liu v thc hin cc truy vn tinh vi trn cc ng dng ngi dng. 12 tiu chun nh gi mt h thng OLAP ca E. F. Codd: 1. Khung nhn khi nim a chiu (Multidimensional Conceptual View): D liu s c trnh by cho ngi dng trn khun mu a chiu.

2. Trong sut (Transparency): Ngi dng khng cn bit h ang s dng CSDL a chiu OLAP. 3. Tnh truy cp (Accessibility): Cc cng c OLAP nn chn d liu ngun tt nht h tr truy vn. 4. Nht qun trong thc thi bo co (Consistent Reporting Performance): S thc thi bo co phi nh nhau khng ph thuc vo dung lng CSDL v s chiu c s dng. 5. C kin trc khch ch (Client-Server Architecture): Cc cng c OLAP c trin khai trn m hnh khch hng phc v. 6. Phn chiu tng qut (Generic Dimensionality): m bo cc chiu d liu l nh nhau trong cu trc v tnh ton. Khng thin v trong vic truy cp bt c chiu no. 7. X l ng Ma trn d liu tha (Dynamic Sparse Matrix Handling): Cc gi tr null c t chc lu tr hiu qu trn ma trn ng. 8. H tr a ngi dng (Multi-User Support): Cng c OLAP phi h tr nhiu ngi dng ng thi. 9. Cc ton t qua cc chiu khng gii hn (Unrestricted Cross-Dimensional Operations): Quy tc kt hp c p dng trn tt c cc chiu. 10. Thao tc d liu bng trc gic (Intuitive Data Manipulation): Ngi dng nhn thy mi d liu cn thit trn giao din, trnh phi s dng qua menu hoc qua nhiu thao tc mi m c giao din. 11. Lp bo co ng (Flexible Reporting): Cho php ngi dng trnh by bo co d liu theo bt k cch no m h thch. 12. Mc kt hp v s chiu khng hn ch (Unlimited Dimensions and Aggregation Levels): S khng c gii hn s chiu v mc kt hp trong m hnh OLAP. 2.2. M hnh d liu a chiu Trong thc t ngi ta c khuynh hng suy ngh theo a chiu. V d mt nh qun l nng nghip khi d on dch bnh xy ra anh ta m t nh sau: C kh nng dch ry nu s bng pht tr li ti cc tnh Tr Vinh, Vnh Long, An Giang trong khong thng 5 nm 2009 . D on ny c nhng ngi thit k khi d liu m t li nh sau:

Hnh 1: M phng cc chiu trong m t dch bnh

Khi d liu (cube): Khi l thnh phn chnh trong cu trc OLAP c s dng lu tr v lit k d liu. N tng t nh khi nim bng (table) trong h thng CSDL quan h. Khi nim khi thng lm cho ngi ta ngh rng n c ba chiu nhng trong cu trc OLAP mt khi c th c nhiu hn 128 chiu. Khi c cu trc a chiu c nh ngha bi tp hp cc chiu v cc o. Cc chiu xc nh cu trc ca khi cn cc o xc nh cc gi tr s m ngi dng quan tm [12]. Mi khi c mt lc dng xc nh tp hp cc bng d liu c quan h vi nhau c ly t d liu ngun lu trong kho d liu. Bng gia lc gi l bng s kin, lu tr cc o ca khi. Cc bng cn li trong lc gi l bng chiu, lu tr cc chiu ca khi. V d: qun l dch hi trn la ta c th to ra mt khi d liu DICHBENH nhm lu li cc thng tin lin quan n dch hi trn la c xt n chiu thi gian, loi bnh, thi tit, giai on sinh trng, a im. Nhng chiu ny cho php ngi dng theo di nhng thng tin lin quan n dch hi nh dch hi xy ra u, trong iu kin thi tit no, trong khong thi gian no v giai on sinh trng no ca cy la.

Hnh 2: Lc cu trc khi DICHBENH.

Chiu d liu (Dimension): Chiu l thuc tnh cu trc to nn khi. Mt chiu c th nm trong mt khi duy nht hoc c chia s cho nhiu khi. Chiu c to ra khi to khi. Mi chiu nh x thng tin n mt bng trong kho d liu gi l bng chiu. V d trong hnh 2.3 gm cc chiu thi gian, loi bnh, thi tit, giai on sinh trng, a im. Phn cp (hierarchy): Phn cp l ct sng ca vic tng hp d liu hay ni cch khc l da vo cc phn cp m vic tng hp d liu mi c th thc hin c. Phn ln cc chiu u c mt cu trc a mc hay phn cp.

Hnh 3: Phn cp ca chiu a im

Cc o (Measures): Trong mt khi, o l tp hp cc gi tr s c da trn ct trong bng s kin ca khi. Cc o l d liu dng s c ngi dng quan tm khi lit k khi. o c la chn da trn loi thng tin ngi dng yu cu. o c to ra khi to khi. Mt o ch thuc mt khi duy nht ngc li mt khi c th c nhiu hn 1024 o [12]. V d trong khi hnh 2.3 cha bn o Dtnhiemnang, Dtnhiemtb, Dtnhiemnhe, Dtphongtri. Cc phn hoch (Partitions): Tt c cc khi u c ti thiu mt phn hoch cha d liu ca n. Mt phn hoch n c t ng to ra khi khi c nh ngha. Khi ta to mt phn hoch mi cho mt khi, phn hoch mi ny c thm vo trong tp hp cc phn hoch tn ti i vi khi. Khi phn nh d liu c kt ni c trong tt c cc phn hoch ca n. Mt bng phn hoch ca khi l v hnh i vi ngi dng. C s d liu OLAP (OLAP Databases): CSDL OLAP l khng gian lu tr cho cc khi v cc i tng lin quan n CSDL. Cc i tng ny bao gm: d liu ngun, cc chiu chia s v cc quy nh v quyn truy cp CSDL (Database role). Nu cc i tng ny c chia s cho nhiu khi th i tng v khi phi nm trong cng mt CSDL [12]. 2.3. Cc lc cho CSDL a chiu Lc hnh sao (star schema): Lc hnh sao bao gm mt bng s kin (Fact table) nm trung tm, v mt s bng chiu (dimension table) kt ni bao quanh bng s kin to thnh hnh ngi sao. Mi bng tng ng vi mt ct trong bng s kin. D liu trong bng chiu c s dng to thnh cc cu truy vn phn tch trn bng s kin.

Hnh 4: Lc hnh sao

Lc bng tuyt (Snowflake schema): Lc bng tuyt l mt bin th ca lc hnh sao, trong mt s bng chiu c chun ha, t c th c tip tc chia d liu thnh nhiu bng khc. Lc c hnh dng nh mt bng tuyt.

Hnh 5: Lc bng tuyt

Lc chm sao s kin (fact constellation): Cc ng dng phc tp c th i hi nhiu bng s kin cng chia s cc bng chiu. Loi lc ny c th c xt nh mt tp hp cc lc hnh sao. V v th, n c gi l lc chm sao s kin.

Hnh 6: Lc chm sao s kin.

2.4. Cc m hnh OLAP thng dng

Hai m hnh OLAP thng dng c nhiu nh cung cp dch v OLAP h tr l MOLAP v ROLAP. S phn bit gia hai m hnh ny da trn cch thc lu tr d liu. ROLAP i din cho x l phn tch trc tuyn trn CSDL quan h. MOLAP i din cho x l phn tch trc tuyn trn CSDL a chiu. 2.4.1 M hnh MOLAP Trong m hnh MOLAP, d liu phn tch c lu tr trong CSDL a chiu chuyn dng nhm phc v tt nht cho cc truy vn tng hp d liu thng xuyn m cn thi gian truy xut nhanh. S liu tnh ton trc v chiu ca khi d liu c lu trong CSDL a chiu. ng c MOLAP trong tng ng dng y d liu a chiu t CSDL a chiu n ngi dng phn tch d liu. Hnh 2.8 trnh by kin trc ca m hnh MOLAP.

Hnh 7: M hnh MOLAP

u im ca MOLAP: Thc thi nhanh cu truy vn nh vo vic ti u ha lu tr, lp ch mc a chiu v c ch b nh cache. p dng tt cho cc h thng c yu cu tnh ton phc tp v thi gian truy xut nhanh bi v tt c cc d liu cn tnh ton c thc hin khi to khi d liu. Khng s dng c ch kho do d liu l ch c. D liu c th d dng sao chp n ngi dng cho phn tch offline.

Nhc im ca MOLAP: Chi ph nhiu thi gian x l d liu (np d liu), c bit trong trng hp khi d liu c dung lng ln. khc phc nhc im ny cc cng c MOLAP cho php ch x l phn d liu c s thay i thay v x l li ton b khi d liu. MOLAP lu tr nhiu d liu d tha nhm p ng thi gian truy xut nhanh.

B gii hn bi dung lng d liu ca h thng do tt c cc d liu tnh ton trc u c lu tr trong khi. iu ny lm cho d liu trong khi c phinh hng tng hp hn l chi tit. Tng thm chi ph do cng ngh a chiu khng c sn trong h thng nn phi u t chi ph cho c cng ngh v hun luyn con ngi.

2.4.2 M hnh ROLAP Trong m hnh ROLAP, d liu c lu tr trn cc bng theo nh dng ca CSDL quan h p ng tt nht cho cc truy vn d liu khng thng xuyn. giu i kin trc lu tr theo nh dng quan h v trnh by d liu a chiu, ROLAP to ra mt lp d liu ng ngha gi l Metadata. Lp Metadata ny h tr vic nh x ca cc chiu n cc bng trong CSDL quan h ng thi h tr vic tng hp v kt hp d liu. Metadata c lu tr ngay trong CSDL quan h.

Hnh 8: M hnh ROLAP

Hnh 2.9 trnh by kin trc ca m hnh ROLAP ba tng. Server phn tch nm trong tng ng dng gia to ra khi d liu a chiu ng cho tng trnh by pha trn. H thng a chiu trong tng trnh by s cung cp khung nhn a chiu ca d liu n ngi dng. Khi ngi dng a ra cu hi phc tp trn d liu a chiu, cu hi c chuyn trc tip ti CSDL quan h. Khng ging nh trong m hnh MOLAP, cc khi a chiu trong m hnh ROLAP khng c to ra v lu tr c nh. u im ca ROLAP: C th p dng vi h thng c dung lng ln do kch c ca ROLAP chnh l kch c ca CSDL quan h. Tit kim khng gian lu tr do d liu lu tr trong CSDL quan h truyn thng t khi trng lp.

ROLAP l k thut hiu qu cho h qun tr CSDL quan h duy tr chc nng truyn thng ca n ng thi thc thi c cc php ton ca h thng OLAP. D liu c cha trong CSDL quan h chun nn c th c truy cp bng bt k cng c SQL no.

Nhc im ca ROLAP: ROLAP chy chm do mi ROLAP report l cu truy vn nguyn thy trong CSDL quan h. Tt c cc tnh ton ca ROLAP da trn cc hm ca SQL, v th chng khng thch hp khi m hnh c nhiu tnh ton nh d ton ngn sch, bo co ti chnh.

2.4.3 So snh gia MOLAP v ROLAP Vic chn la gia m hnh ROLAP v MOLAP ph thuc vo phc tp ca cu truy vn trn h thng v yu cu thi gian truy xut d liu. MOLAP c la chn khi h thng cn thi gian truy xut nhanh v cc cu truy vn c yu cu tnh ton phc tp. ROLAP c la chn khi h thng c dung lng d liu qu ln, khng yu cu thi gian p ng cao v tng xut truy cp d liu khng thng xuyn. Tuy nhin, vic quyt nh chn MOLAP hay ROLAP c th da vo vic so snh chi tit trn cc gc k thut lu tr, cng ngh c ng dng v cc c trng ca m hnh.

Hnh 9: So snh gia MOLAP v ROLAP

2.4.4. M hnh HOLAP M hnh HOLAP l s kt hp gia MOLAP v ROLAP, lu tr cc khi trong cu trc HOLAP l tt nht cho cc truy vn tng hp d liu thng xuyn da trn mt lng ln d liu c s. V d, chng ta s lu tr d liu bn hng theo hng qu, hng nm trong cu trong MOLAP v d liu hng thng, hng tun v hng ngy trong cu trc ROLAP.

Hnh 10: M hnh HOLAP

2.5. Cc thao tc OLAP trong m hnh d liu a chiu - Roll up: Thao tc ny i theo hng cao hn trong cu trc phn cp, nhm tng hp s liu mc cao hn. - Drill-down: Thao tc ngc vi Roll-up, theo hng v cp thp hn trong cu trc phn cp, nhm trnh by d liu mc chi tit hn. - Slice and Dice: Thao tc ny thc hin mt php chn chiu trn mt hoc nhiu chiu ca mt khi d liu cho, kt qu thu c s l mt khi d liu con. - Pivot (hay rotate): Thao tc pivot l mt thao tc minh ha, quay cc trc d liu trong khung nhn nhm cung cp mt dng biu din khc ca d liu chn la cch biu din.

Drilldown

Roll up

Hnh 11: Minh ha cc thao tc Roll up, Drill down

Pivot (hay rotate)

Hnh 2.1: Minh ha thao tc Pivot

Hnh 12: Minh ha thao tc Slice and Dice

3. XY DNG NG DNG OLAP Microsoft h tr cc nh xy dng ng mt h thng cc cng c sn dng cho php xy dng mt ng dng OLAP trn mi trng Web nhanh chng v d dng. Cc cng c ny c tch hp trong mt mi trng thng nht v vic kt ni gia chng rt d thc hin thng qua giao din ha. Cc cng c ny c th chia thnh 3 nhm chnh: nhm cng c t chc lu tr kho d liu, nhm cng c t chc lu tr CSDL a chiu v nhm cng c h tr hin th khi d liu.

3.1. S kt ni gia cc nhm cng c


Hin th ni dung khi d liu (Microsoft Office Web Components, Microsoft Internet Explorer) Microsoft Excel

CSDL OLAP (Analysis Services 2005)

Cube

Cube

Data Warehouse (SQL Server 2005)

Cng c trch lc v np nhp (Integration Services 2005)

Bo co s liu (File Excel)

CSDL Quan H

Hnh 23: Trnh by S kt ni gia cc nhm cng c

3.2. Trang thng k trc tuyn s liu bnh Trn trang giao din thng k d liu bnh, ngi dng t to ra mt bo co ng theo yu cu bng cch ko th cc ct cn thit vo cc vng trong ca s. V ngi dng cn c th chn cc hm thng k (sum, max, min, average) tnh cc s liu theo yu cu. V d: tng hp din tch nhim nng cc bnh: o n c bng, Ln xon l, Ry cnh trng, Ry nu, Su cun l nh v Vng ln ti tnh Tr Vinh trong nm 2008 phn b theo cc ma trong nm ta thao tc nh sau: Ko r ct Tenthuong (tn bnh) t PivotTable vo vng Row Field, trch lc theo tiu chun l cc bnh cn tng hp trong hnh 4.20. Ko r ct Mua vo vng Column Field. Ko r ct Dtnhiemnang vo vng Totals or Detail fields. Ko r ct Tentinh vo vng Filter Fields, trch lc theo tiu chun Tr Vinh. Ko r ct Nam vo vng Filter Fields, trch lc theo tiu chun 2008.

Hnh 143: Minh ha thng k d liu theo ma

3.3. Trang v biu thng k s liu mu bnh Trn trang giao din v biu thng k s liu bnh, ngi dng c th to ra mt biu ng theo yu cu bng cch ko th cc ct cn thit vo cc vng trnh by biu trong ca s. V d: v biu th hin s lng ry nu vo by n ti cc huyn thuc tnh Tr Vinh trong nm 2008 ta thao tc nh sau: Ko r ct Tenhuyen vo vng Category Fields. Ko r ct Soluong vo vng Categories. Ko r ct Tentinh vo vng Filter Fields, trch lc theo chun Tr Vinh. Ko r ct Nam vo vng Series Fields, trch lc theo chun 2008.

Hnh 15: Biu th hin s lng ry nu vo by n

4. KT LUN Nghin cu ny xy dng thnh cng ng dng thng k trc tuyn s dng k thut OLAP. Kt qu ny cho thy mt kh nng ng dng thc t rt tt ca k thut OLAP cho cc ng dng cung cp thng tin trong lnh vc nng nghip. Tuy nhin, trin khai ng dng vo thc t cn pht trin thm cc ni dung sau: xy dng cc khi d liu da trn kin ca cc chuyn gia qun l dch hi trn la; Trin khai ng dng trn cc cng c m ngun m. y l mt gii php gip gim chi ph khi a ng dng vo p dng thc t. TI LIU THAM KHO 1. Hunh Tun Anh (2008), Bi ging Datawarehouse and data mining, Trng i Hc Nha Trang. 2. V Thanh Hong (2008), Gio trnh bnh cy chuyn khoa, Trng i Hc Cn Th. 3. Phm Vn Kim (2009), Gio trnh Cc nguyn l v bnh hi cy trng, Trng i Hc Cn Th. 4. Dave Stearns (1999), Introducing the Office Web Components, Microsoft Corporation. 5. Erik Thomsen (2002), OLAP Solutions Building Multidimensional Information Systems, Wiley. 6. Eric (2005), Implementing Office Web Component Pivot Tables with ASP.NET, Microsoft Corporation. 7. Jeffrey Hasan and Kenneth Tu (2003), Build an OLAP Reporting App in ASP.NET Using SQL Server 2000 Analysis Services and Office XP, Microsoft Corporation. 8. Murugan Anandarajan, Asokan Anandarajan, Cadambi A. Srinivasan (2004), Business Intelligence Techniques: A Perspective from Accounting and Finance, Springer. 9. Paulraj Ponniah (2001), Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals, Wiley. 10. Robert Wrembel, Christian Koncilia (2007), Data Warehouses and OLAP: Concepts, Architectures and Solutions, IRM Press. 11. Sivakumar Harinath and Stephen R. Quinn (2006), Professional SQL Server Analysis Services 2005 with MDX, Wiley. 12. Swathi R. Kasireddy (2007), Olap Reporting Application Using Office Web Components, Trng i Hc Akron.

You might also like