You are on page 1of 6

IPASJ International Journal of Information Technology (IIJIT)

A Publisher for Research Motivation ........

Volume 1, Issue 1, June 2013

Web Site: http://www.ipasj.org/IIJIT/IIJIT.htm Email: editoriijit@ipasj.org ISSN 2321-5976

A Novel Technique for Information Retrieval based on Cloud Computing


Dr. Sanjay Mishra, Dr. Arun Tiwari
Assistant Professor, Department of Computer Science and Engineering, Amity University, Dubai, UAE

ABSTRACT
The procedure of data Retrieval (IR)algorithmappears artfully modest once ascertained from the point of view of word rationalization. However, the implementation mechanism of the IR algorithmic rule is sort of difficult and notably once enforced to gratify the definite structure needs. during this analysis, the knowledge Retrieval algorithmic rule is developed mistreatment the MapReduce mechanism to retrieve the knowledge during a Cloud computing atmosphere. The MapReduce algorithmic rule was developed by Google for experimental evaluations. within the gift study, the algorithmic rule portrays the leads to terms of range of buckets needed to come up with the output from the big chunk of knowledge in Cloud computing. The algorithmic rule is that the a part of the entire Business Intelligence tool to be enforced and also the results to be delivered for Cloud computing design.

Keywords: IR algorithmic rule, Cloud Computing, MapReduce, Business Intelligence, Name nodes, Data nodes, Main Server, Secondary Server, info Server.

1. INTRODUCTION
Cloud computing is evolving as a unique image for terribly climbable, fault-tolerant, and compliant computing on huge clusters of computers. Cloud architectures offer extremely procurable storage and cypher capability through dissemination and replication. Cloud computing as a developing technology is anticipatedto reconstitute the knowledge retrieval procedures within the nearfuture. A typical cloud application would have knowledge|a knowledge|an information} owneroutsourcing data services to a cloud, wherever the information is storedin a keyword-value type, and users may retrieve the datawith many keywords [1]. owing to this reason, MapReduce mechanism finds its suitableness to style and implement the IR algorithmic rule. conjointly significantly, Cloud architectures adapt to dynamical needs by dynamically provisioning new (virtualized) cypher or storage nodes [2]. conjointly various services and dynamically scalablevirtualized resources area unit adscititious to the cloud [3] nearly at each instance of your time and Cloud computing makes the resources offered universally with better flexibility[4]. The need for enhancements in infoservices as well as information retrieval is currently mandatorydue to the rise of virtualized resources in cloud [4]. All the cloud resources aredistributed whereas the present search engines such asYahoo, Google, and MSN area unit centralized systems [5]. Centralized systems area unit sufferingfrom the various drawbacks as well as less quantifiability,frequent server failures and data retrieval issuesas mentioned by [6]. Document virtualization is additionally becomingpopular over the previous few years [7].Existing distributed IR models are unable to searchinside a virtualized physical node with multiple virtualsystems running in parallel within the variety of a grid. [5]proposed a distributed IRmodel to resolve the difficulty of correct and quick allocationof needed info however still several problems areunsolved.A changed IR model is that the want of the timewhich will work with efficiency with virtualized resources [4]. This paper is an effort to style the IR algorithmic rule with the utilization of MapReduce mechanism. The algorithmic rule is verified and simulated results area unit evaluated supported the subsequent criterias:

Volume 1, Issue 1, June 2013

Page 8

IPASJ International Journal of Information Technology (IIJIT)


A Publisher for Research Motivation ........

Volume 1, Issue 1, June 2013

Web Site: http://www.ipasj.org/IIJIT/IIJIT.htm Email: editoriijit@ipasj.org ISSN 2321-5976

1) The algorithmic rule takes the quantity of Search requests as input. 2) The algorithmic rule then breaks the Search requests into range of chunks needed for the knowledge retrieval from the general public cloud. 3) Based on the 2 assumptions, the algorithmic rules will the mapping performalities and determines the quantity of buckets needed to perform the scale back function of the algorithm. Thus, the most aim of the algorithmic rule is to manage the quantity of buckets (packets) needed to accomplish the MapReduce algorithmic rule with none deterrent. The algorithmic rule (as portrayed within the Annexure A) of the paper is being tested on the big range of requests supported totally different chunks of knowledge. The rest of the paper is split as follows: Section two elucidatesabout the MapReduce mechanism. Section three elaborates regarding Cloud computing design very well. Section 4outlines the elementary concerns for the IR algorithmic rule mistreatment MapReduce mechanism. Section 5describes the IR algorithmic rule and outline of the various functions employed in the particular Java code. Section 6illustrates the outcomes of the code execution. Section seven particularizes the logical thinking and commendations supported the experimentation. The paper conjointly includes Annexure A which incorporates the Java code snipping for IR algorithmic rule.

2. MAP REDUCE MECHANISM


The thought of Map Reduce was introduced by Google in 2004 and is that the backbone of the many larger knowledge computations. Map Reduce is basically a divide and conquer algorithmic rule that breaks down the matter in to little parts and process it in parallel to accomplish economical computation on a bigger knowledge set. The MapReduce mechanism includes steps: 1. Map 2. Reduce Map: In Map step, the most node acquires the input, partitions it up into smaller sub-problems, and distributes them to knowledge nodes. a knowledge node could try this over successively, resulting in a multi-level tree structure. The information node processes the smaller drawback, and passes the response back to its main node. Reduce: In scale back step, the most node then collects the responses to all or any the sub-problems and merges them in several ways to stipulate the output the reply to the matter it absolutely was at first attempting to resolve. The overall structure of MapReduce mechanism is portrayed in Figure 1:

Figure 1: Map Reduce structure

Volume 1, Issue 1, June 2013

Page 9

IPASJ International Journal of Information Technology (IIJIT)


A Publisher for Research Motivation ........

Volume 1, Issue 1, June 2013 3. CLOUD COMPUTING DESIGN

Web Site: http://www.ipasj.org/IIJIT/IIJIT.htm Email: editoriijit@ipasj.org ISSN 2321-5976

The cloud computing design used for the experiment includes 3 differing types of servers, namely: 1) 2) 3) Main Server Secondary Server Database Server

The cloud design has each master nodes and slave nodes. during this enactment, a main server is one that gets shopper requests and handles them. The master node is gift in main server and also the slave nodes in secondary server.Search requests area unit forwarded to the MapReduce algorithmic rule gift in main server. MapReduce takes care of the looking out and compartmentalization procedure by instigating an oversized range of Map and scale back processes. Once the MapReduce method for a specific search key's completed, it returns the output worth to the most server and successively to the shopper. the entire design is portrayed in Figure two.

Figure 2: Implementation of data Retrieval (IR) algorithmic rule during a Cloud computing atmosphere As mentioned in Figure two, the knowledge needed by the shopper is send on to the most Server. For simplicity, the most server is termed as Name node and stores the Meta knowledge regarding the knowledge. The Meta knowledge includes the scale of the file, actual location of the file, block locations amongst others. every of the knowledge (file) is replicated in range of Secondary Servers, named as knowledge nodes. knowledge nodes are literally accountable to trace the information from the information centers. The complete practicality of the MapReduce algorithmic rule operates as follows: 1) The shopper requests hit the most Node. 2) The Main node has the MapReduce algorithmic rule in situ and will the task of mapping. In shell, Name node keepstrajectory of complete file directory structure and also the placement of chunks. so Name node is that the essential management purpose for the entire system. To scan a file, the shopper API can calculate the chunk index supported the offset of the file pointer and build asking to the Name node. The Name node can reply that knowledge nodes contains a copy of that chunk. From thispoint, the shopper contacts the information node directly while not hunting the Name node. 3) The shopper pushes its changes to all or any knowledge nodes, and also the amendment is hold on during a buffer of every knowledge node. once changes area unit buffered in any respect knowledge nodes, the shopper send a commit request, and shopper gets the response regarding the success.

Volume 1, Issue 1, June 2013

Page 10

IPASJ International Journal of Information Technology (IIJIT)


A Publisher for Research Motivation ........

Volume 1, Issue 1, June 2013


The preceding 3 steps area unit portrayed in Figure three.

Web Site: http://www.ipasj.org/IIJIT/IIJIT.htm Email: editoriijit@ipasj.org ISSN 2321-5976

Figure 3: Operational Steps of the IR algorithmic rule mistreatment MapReduce during a Cloud Computing atmosphere After accomplishment of the 3 steps explicit on top of, all modifications of chunk distribution associated information alterations are transcribed to an operation log file at the Name node. This log file preserves associate order list of operation that is critical for the Name node to recover its read once a crash. The Name node conjointly keeps its persistent state by often check-pointing to a file.

4. IR ALGORITHMIC RULE WITH AND WHILE NOT MAPREDUCE MECHANISM


As the study conducted within the analysis is that the comparative analysis of performance of IR algorithmic rule with and while not MapReduce mechanism, this phase of the paper elaborate the flow diagram of implementation of each the algorithms very well. 4.1 flow diagram of IR algorithmic rule while not MapReduce mechanism The IR algorithmic rule implementation while not MapReduce works in 3 fold: a) The requests area unit broken into range of elements. b) Each of those elements area unit processed in ordered order at totally different knowledge centers and response is remit to the most server. c) The main server that has IR algorithmic rule joins every of the response and sends back to the user.

Figure 4: IR algorithmic rule while not MapReduce mechanism

Volume 1, Issue 1, June 2013

Page 11

IPASJ International Journal of Information Technology (IIJIT)


A Publisher for Research Motivation ........

Volume 1, Issue 1, June 2013

Web Site: http://www.ipasj.org/IIJIT/IIJIT.htm Email: editoriijit@ipasj.org ISSN 2321-5976

4.2 flow diagram of IR algorithmic rule via MapReduce mechanism In this section, the IR algorithmic rule mistreatment the MapReduce implementation for the cloud computing atmosphere is being developed and dead. The projected algorithmic rule is employed in IR algorithmic rule to retrieve results from the planet Wide internet, and also the outcomes portrayed within the next section shows that MapReduce mechanism area unit wont to improve the celerity of data search. The projected algorithmic rule is associate reiterative technique that creates use of the 3 strategies, namely, map() reduce() and combine(), within the main server, to indicate the results. Categorization is employed to retrieve and order the results in line with the user option to modify the search.

Figure 5: IR algorithmic rule with MapReduce mechanism

5. RESULTS
The Results of the complete experiment area unit portrayed during this phase of the paper. Few imperative points important here are: 1) Experiment is conducted between 5000 to 20000 requests/s. 2) The experiments represent the result for the pool of 4 Bucket sizes, 1000, 2000, 3000 and 4000. Table 1: Comparative study of IR algorithmic rule with and while not MapReduce mechanism
Number of Requests/s Choice of the IR Algorithm Bucket Size=1000 Response time without MapReduce Algorithm (in s) 54213 66923 77343 Response time via MapReduce Algorithm (in s) 53893 64883 75893 Bucket Size=2000 Response time without MapReduce Algorithm (in s) 53922 65126 75898 Response time via MapReduce Algorithm (in s) Bucket Size=3000 Response time without MapReduce Algorithm (in s) 53922 65126 75898 Response time via MapReduce Algorithm (in s) 53893 64893 75882 Bucket Size=4000 Response time without MapReduce Algorithm (in s) 54798 66098 79876 Response time via MapReduce Algorithm (in s) 53893 64882 75882

5000 6000 7000

53893 64893 75882

Volume 1, Issue 1, June 2013

Page 12

IPASJ International Journal of Information Technology (IIJIT)


A Publisher for Research Motivation ........

Volume 1, Issue 1, June 2013


8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 87903 99123 129924 148264 159924 163434 156894 163268 192876 208734 229869 250980 277987 86893 97893 108894 120894 132894 144894 156894 168894 180894 192894 204894 216894 228894 87961 98956 119862 130700 149879 166973 176756 185683 192876 208342 238672 237803 249800 86893 97871 108894 120894 132894 144846 156894 168894 180894 192894 204894 216894 228894

Web Site: http://www.ipasj.org/IIJIT/IIJIT.htm Email: editoriijit@ipasj.org ISSN 2321-5976


87961 98956 119862 130700 149879 166973 176756 185683 192876 208342 238672 237803 249800 86882 97893 108894 120833 132894 144894 156894 168894 180894 192846 204894 216894 228894 87762 99877 110872 139813 158090 179898 190232 209873 219098 239098 248803 270892 308767 86893 97893 108872 120871 132894 144894 156882 168882 180894 192894 204882 216894 228894

6. EVALUATING THE PERFORMANCE


Dissimilar sets of requests were delivered, every of altered size, and accomplished the MapReduce jobs in singlenode clusters. The corresponding times of execution were calculated and also the conclusion of death penalty the experiment was that running MapReduce in clusters is out and away the additional effectual for an oversized volume of requests. The two vital inferences from the study cause two obviousresults: In a cloud atmosphere, the MapReduce structure upsurges the skillfulness of output for giant range of requests. In distinction, one would not unescapably see such a rise in output during a non-cloud system. When the information set is tiny, MapReduce don't affectsubstantial increase in output during a cloud system. Therefore, think about a mix of MapReduce-style {parallel methoding|multiprocessing|data processing} once aiming to process an oversized quantity of requests within the cloud system.

References
[1.] Bordogna, G & Pasi, G. A fuzzy linguistic approach generalizing Boolean information retrieval: a model and its evaluation, Journal of the American Society for Information Science, 44(2), 1993, pp: 70-82 [2.] Belew, R., "Adaptive information retrieval", Proceedings of the Twelfth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, 1989, pp: 11-20 [3.] Blair, D.C. & Maron, M.E. An evaluation of retrieval effectiveness for a full text document-retrieval system, Communications of the ACM, 28(3), 1985, pp: 289-299 [4.] Bookstein, A. Probability and fuzzy-set applications to information retrieval, Annual Review of Information Science and Technology, 20, 1985, pp: 117-151 [5.] Chen, H., & Dhar, V., "Cognitive process as a basis for intelligent retrieval systems design", Information Processing and Management, 27, 1991, pp: 405-432 [6.] Goldberg, D.E. Genetic Algorithms in Search, Optimization and Machine Learning, Reading M.A.: AddisonWesley, 1989 [7.] Gordon, M.D. Probabilistic and genetic algorithms for document retrieval, Communications of the ACM, 31(10), 1988, pp: 1208-1218 [8.] Gordon, M.D. User-based document clustering by redescribing subject descriptions with a genetic algorithm, Journal of the American Society for Information Science, 42, 1991, pp: 311-322 [9.] Harman, D., "An experimental study of factors important in document ranking", in Proceedings of the ACM SIGIR, 1986, pp: 186-193 [10.] Holland, J.H. Adaptation in Natural and Artificial Systems, Ann Arbor: The University of Michigan Press, 1975

Volume 1, Issue 1, June 2013

Page 13

You might also like