Professional Documents
Culture Documents
of Questions : 12]
P1469
Time :3 Hours]
[4164] -711
Instructions to the candidates :1) Answers to the two sections should be written in separate books. 2) Neat diagrams must be drawn wherever necessary. 3) Figures to the right indicate full marks. 4) Assume suitable data, if necessary.
SECTION - I Q1) a) b) c) Explain speedup and scaleup in parallel databases with suitable diagram. [5] Explain range partitioning sort in parallel database along with its suitability. [5] Explain partitioning techniques in parallel database along with examples.[6] OR Q2) a) b) c) Explain fragment and replicate join schemes. Describe the benefits and drawbacks of pipelined parallelism. [8] [4]
The histograms are used for constructing load balanced range partitions suppose you have a histogram where values are between 1 and 100, and are partitioned into 10 ranges, 1-10, 11-20,......... 91-100, with frequencies 15, 5, 20, 10, 10, 5, 5, 20, 5 and 5, respectively. Give a load-balanced range partitioning function to divide the values into 5 partitions. [4]
P.T.O.
Q3) a)
Consider the relations: employee (name, address, salary, plant-number) machine (machine-number, type, plant-number) Assume that the employee relation is fragmented horizontally by plantnumber, and that each fragment is stored locally at its corresponding plant site. Assume that the machine relation is stored in its entirety at the Armonk site. Describe a good strategy for processing each of the following queries. i) ii) Find all machines at the Almaden plant. machine.
Find all employees at the plant that contains machine number 1130. Find employee [6]
b) c) Q4) a)
Explain two phase commit protocol. How three phase commit protocol overcomes the disadvantages of two phase commit protocol. [6] Explain distributed transaction management. OR [6]
iii)
Explain following concurrency control schemes along with advantages & disadvantages in distributed databases. i) ii) Distributed lock manager. Majority protocol. [8]
b) c)
List the difference between directory and database. Also explain LDAP. [6]
When is it useful to have replication or fragmentation ? Explain your answer. [4] What is N tier architecture? Explain its advantages with example. OR [8]
Q5) a)
b)
Explain the components of an XML document with suitable example.[8] Which are different parsers for XML? Explain them in brief. [6]
Q6) a)
b) c)
How will you define simple and complex types using XML schemas? Explain with example. [6] i) Explain the following with respect to web architecture. Web server Common gateway interface.
ii)
[4]
[4164]-711
SECTION - II Q7) a) b) c) Differentiate between OLTP and OLAP systems. Explain the architecture of Data warehouse. [8] [4]
Suppose that a data warehouse for Big - University consists of the following four dimensions: Student, course, semester and instructor, and two measures count and average - grade where average- grade measure stores the actual course grade of the student. Draw a snowflake schema diagram for the data warehouse. [4] OR
Q8) a) b)
Explain the following operations of OLAP on multidimensional data with example. i) ii) Slice and dice. Roll up and drill down. [4] [4]
What is noisy data? Explain data cleaning process. How missing values are handled? [8]
c) Q9) a)
A database has five transactions. Let min-sup = 20% and min-cont = 75% TID 100 200 300 400 500 600 X, Y, Z Y, W Items
X, Z, W, U, V, W U, X, Z [8] V, Y, Z
i)
ii)
Find all frequent itemsets using Apriori Algorithm. List all strong association rules.
b) c)
State and explain the algorithm for inducing a decision tree from training tuples. [8] [2]
[4164]-711
Q10)a) b)
c)
Explain the following terms with example. i) Closed frequent itemset ii) Maximal frequent itemset
Suppose that the data mining task is to cluster points (with (x,y) representing location) into three clusters, where the points are A1 (2,10), A2 (2,5), A3 (8,4), A4 (5,8), A5 (7,5), A6 (6,4), A7 (1,2), A8 (4,9). The distance function is euclidean distance. Suppose initially we assign A1, A4 and A7 as the center of each cluster respectively. Use the K-means algorithm to show final three clusters. [8]
[6]
[4] [8]
Q11)a)
b)
Explain typical architecture of information retrieval system. Write short notes on i) Vector-space model ii)
Q12)a) b) c)
Explain page rank algorithm with example. Write a short note on web crawler. Explain the terms. i) Inverted index. ii) Ontology. iii) Homonyms.
OR
zzz
[4164]-711