You are on page 1of 4

Total No.

of Questions : 12]

P1469

Time :3 Hours]

(2008 Pattern) (Sem. - II) (Elective-III)

B.E. (Computer Engineering) ADVANCED DATABASES

[4164] -711

SEAT No. : [Total No. of Pages : 4

Instructions to the candidates :1) Answers to the two sections should be written in separate books. 2) Neat diagrams must be drawn wherever necessary. 3) Figures to the right indicate full marks. 4) Assume suitable data, if necessary.

[Max. Marks :100

SECTION - I Q1) a) b) c) Explain speedup and scaleup in parallel databases with suitable diagram. [5] Explain range partitioning sort in parallel database along with its suitability. [5] Explain partitioning techniques in parallel database along with examples.[6] OR Q2) a) b) c) Explain fragment and replicate join schemes. Describe the benefits and drawbacks of pipelined parallelism. [8] [4]

The histograms are used for constructing load balanced range partitions suppose you have a histogram where values are between 1 and 100, and are partitioned into 10 ranges, 1-10, 11-20,......... 91-100, with frequencies 15, 5, 20, 10, 10, 5, 5, 20, 5 and 5, respectively. Give a load-balanced range partitioning function to divide the values into 5 partitions. [4]

P.T.O.

Q3) a)

Consider the relations: employee (name, address, salary, plant-number) machine (machine-number, type, plant-number) Assume that the employee relation is fragmented horizontally by plantnumber, and that each fragment is stored locally at its corresponding plant site. Assume that the machine relation is stored in its entirety at the Armonk site. Describe a good strategy for processing each of the following queries. i) ii) Find all machines at the Almaden plant. machine.

Find all employees at the plant that contains machine number 1130. Find employee [6]

b) c) Q4) a)

Explain two phase commit protocol. How three phase commit protocol overcomes the disadvantages of two phase commit protocol. [6] Explain distributed transaction management. OR [6]

iii)

Explain following concurrency control schemes along with advantages & disadvantages in distributed databases. i) ii) Distributed lock manager. Majority protocol. [8]

b) c)

List the difference between directory and database. Also explain LDAP. [6]

When is it useful to have replication or fragmentation ? Explain your answer. [4] What is N tier architecture? Explain its advantages with example. OR [8]

Q5) a)

b)

Explain the components of an XML document with suitable example.[8] Which are different parsers for XML? Explain them in brief. [6]

Q6) a)

b) c)

How will you define simple and complex types using XML schemas? Explain with example. [6] i) Explain the following with respect to web architecture. Web server Common gateway interface.

ii)

[4]

[4164]-711

SECTION - II Q7) a) b) c) Differentiate between OLTP and OLAP systems. Explain the architecture of Data warehouse. [8] [4]

Suppose that a data warehouse for Big - University consists of the following four dimensions: Student, course, semester and instructor, and two measures count and average - grade where average- grade measure stores the actual course grade of the student. Draw a snowflake schema diagram for the data warehouse. [4] OR

Q8) a) b)

Explain the following operations of OLAP on multidimensional data with example. i) ii) Slice and dice. Roll up and drill down. [4] [4]

What is noisy data? Explain data cleaning process. How missing values are handled? [8]

c) Q9) a)

Write a note on data marts.

A database has five transactions. Let min-sup = 20% and min-cont = 75% TID 100 200 300 400 500 600 X, Y, Z Y, W Items

X, Z, W, U, V, W U, X, Z [8] V, Y, Z

i)

ii)

Find all frequent itemsets using Apriori Algorithm. List all strong association rules.

b) c)

Differentiate between classification and clustering. OR 3

State and explain the algorithm for inducing a decision tree from training tuples. [8] [2]

[4164]-711

Q10)a) b)

Explain the architecture of typical data mining system.

c)

Explain the following terms with example. i) Closed frequent itemset ii) Maximal frequent itemset

Suppose that the data mining task is to cluster points (with (x,y) representing location) into three clusters, where the points are A1 (2,10), A2 (2,5), A3 (8,4), A4 (5,8), A5 (7,5), A6 (6,4), A7 (1,2), A8 (4,9). The distance function is euclidean distance. Suppose initially we assign A1, A4 and A7 as the center of each cluster respectively. Use the K-means algorithm to show final three clusters. [8]

[6]

[4] [8]

Q11)a)

b)

Explain typical architecture of information retrieval system. Write short notes on i) Vector-space model ii)

TF-IDF method of ranking

Q12)a) b) c)

Explain page rank algorithm with example. Write a short note on web crawler. Explain the terms. i) Inverted index. ii) Ontology. iii) Homonyms.

OR

[8] [6] [4] [6]

zzz

[4164]-711

You might also like