Professional Documents
Culture Documents
I. INTRODUCTION
The most fundamental challenge for the Big Data
applications is to explore the large volumes of data and extract
useful information or knowledge for future actions. In many
situations, the knowledge extraction process has to be very
efficient and close to real-time because storing all observed data
is nearly infeasible. For example, the Square Kilometer Array
(SKA) in Radio Astronomy consists of 1,000 to 1,500 15-meter
dishes in a central 5km area. It provides 100 times more
sensitive vision than any existing radio telescopes, answering
fundamental questions about the Universe. However, with a 40
gigabytes(GB)/second data volume, the data generated from the
SKA is exceptionally large. Although researchers have
confirmed that interesting patterns, such as transient radio
anomalies can be discovered from the SKA data, existing
methods are incapable of handling this Big Data. As a result, the
unprecedented data volumes require an effective data analysis
12
www.ijcsmr.org
13
www.ijcsmr.org
14
www.ijcsmr.org
15
www.ijcsmr.org
16
www.ijcsmr.org
noise
ACKNOWLEDGMENT
The author would like to thank the anonymous reviewers.
REFERENCES
[1]. Ahmed and Karypis 2012, Rezwan Ahmed, George
Karypis, Algorithms for mining the evolution
of conserved
relational states
in dynamic networks, Knowledge and
Information Systems, December 2012, Volume 33, Issue 3, pp
603-630
[2]. Alam et al. 2012, Md. Hijbul Alam, JongWoo Ha,
SangKeun Lee, Novel approaches to crawling important pages
early, Knowledge and Information Systems, December 2012,
Volume 33, Issue 3, pp 707-734
[3]. L. Huan and M. Hiroshi, Feature Selection for Knowledge
Discovery and Data Mining, ser. Engineering and Computer
Science. : Kluwer Academic Publishers, 1998.
[4] M. R. Chmielewski and J. W. Grzynala-Busse, Global
discretization of continuous attributes as preprocessing for
machine learning, Int. J. Approximate Reasoning, vol. 15, pp.
319331, 1996.
[5]. J. Catlett, On changing continuous attributes into ordered
discrete attributes, in Proc. Machine LearningEWSL-91,
Mar. 1991, vol. 482, pp. 164178.
[6]. U. M. Fayyad and K. B. Irani, Multi-interval discretization
of continuousvalued attributes for classification learning, in
Proc. IJCAI-93, Aug./Sep. 1993, vol. 2, pp. 10221027.
[7]. Bollen et al. 2011, J. Bollen, H. Mao, and X. Zeng, Twitter
Mood Predicts the Stock Market, Journal of Computational
Science, 2(1):1-8, 2011.
[8]. Borgatti S., Mehra A., Brass D., and Labianca G. 2009,
Network analysis in the social sciences, Science, vol. 323,
pp.892-895.
[9]. Bughin et al. 2010, J Bughin, M Chui, J Manyika, Clouds,
big data, and smart assets: Ten tech-enabled business trends to
watch, McKinSey Quarterly, 2010.
[10]. Centola D. 2010, The spread of behavior in an online
social network experiment, Science, vol.329, pp.1194-1197.
[11]. X. Zhu and X. Wu, Class noise vs. attribute noise: A
quantitative study of their impacts, Artif. Intell. Rev., vol. 22,
no. 3/4, pp. 177210,
Nov. 2004.
[12]. D. Luebbers, U. Grimmer, andM. Jarke, Systematic
development of data mining-based data quality tools, in Proc.
29th VLDB, Berlin, Germany, 2003.
[13]. R. Agrawal and R. Srikant, Privacy-preserving data
mining, in Proc. ACM SIGMOD, 2000, pp. 439450.
17
www.ijcsmr.org