You are on page 1of 7

B.E. / B.

Tech DEGREE EXAMINATION, NOVEMBER/DECEMBER 2008 Seventh Semester Information Technology CS 1004 DATA WAREHOUSING AND MINING (Regulation 2004) Time: Three hours Maximum: 100 marks Answer ALL questions. PART A (10 x 2=20 marks) 1. What is the difference between view and materialized view? 2. Explain the Difference between star and snowflake schema? 3. Mention the various tasks to be accomplished as part of data pre-processing. 4. Define Data Mining. 5. What is over fitting and what can you do to prevent it? 6. In classification trees, what are surrogate splits, and how are they used? 7. What is the objective function of the K-Means algorithm? 8. The nave Bayes classifier makes what assumption that motivates its name? 9. What is the frequent itemset property? 10. Mention the advantages of Hierarchical clustering. PART B (5 x 16 = 80 marks) 11. (a) Enumerate the building blocks of a data warehouse. Explain the importance of metadata in a data warehouse environment. What are the challenges in metadata management?[Marks 16] Or (b) (i) Distinguish between the entity-relationship modeling technique and dimensional modeling. Why is the entity-relational modeling technique not suitable for the data warehouse?[Marks 8] (ii) Create a star schema diagram that will enable FIT-WORLD GYM INC. to analyze their revenue. The fact table will include for every instance of revenue taken attribute(s) useful for analyzing revenue. The star schema will include all dimensions that can be useful for analyzing revenue. Formulate query: Find the percentage of revenue generated by members in the last year.

How many cuboids are there in the complete data cube?[Marks 8] 12. (a) Explain the 5 steps in the Knowledge Discovery in Databases (KDD) process. Discuss in brief the characterization of data mining algorithms. Discuss in brief important implementation issues in data mining. [Marks 5 + 6 + 5] Or (b) Distinguish between statistical inference and exploratory data analysis. Enumerate and discuss various statistical techniques and methods for data analysis. Write a short note on machine learning. What is supervised and unsupervised learning? Write a short note on regression and correlation.[Marks 16] 13. (a) Decision tree induction is a popular classification method. Taking one typical decision tree induction algorithm , briefly outline the method of decision tree classification.[Marks 16] Or (b) Consider the following training dataset and the original decision tree induction algorithm (ID3). Risk is the class label attribute. The Height values have been already discretized into disjoint ranges. Calculate the information gain if Gender is chosen as the test attribute. Calculate the information gain if Height is chosen as the test attribute. Draw the final decision tree (without any pruning) for the training dataset. Generate all the IF-THEN rules from the decision tree. Gender Height Risk F (1.5, 1.6) Low M (1.9, 2.0) High F (1.8, 1.9) Medium F (1.8, 1.9) Medium F (1.6, 1.7) Low M (1.8, 1.9) Medium F (1.5, 1.6) Low M (1.6, 1.7) Low M (2.0, 8) High M (2.0, 8) High F (1.7, 1.8) Medium M (1.9, 2.0) Medium F (1.8, 1.9) Medium F (1.7, 1.8) Medium F (1.7, 1.8) Medium

[Marks 16] 14. (a) Given the following transactional database 1 C, B, H 2 B, F, S 3 A, F, G 4 C, B, H 5 B, F, G 6 B, E, O (i) We want to mine all the frequent itemsets in the data using the Apriori algorithm. Assume the minimum support level is 30%. (You need to give the set of frequent itemsets in L1, L2, candidate itemsets in C1, C2,) [Marks 9] (ii) Find all the association rules that involve only B, C.H (in either left or right hand side of the rule). The minimum confidence is 70%.[Marks7] Or (b) Describe the multi-dimensional association rule, giving a suitable example.[Marks 16] 15. (a) BIRCH and CLARANS are two interesting clustering algorithms that perform effective clustering in large data sets. (i) Outline how BIRCH performs clustering in large data sets. [Marks 10] (ii) Compare and outline the major differences of the two scalable clustering algorithms : BIRCH and CLARANS.[Marks 6] Or (b) Write a short note on web mining taxonomy. Explain the different activities of text mining. Discuss and elaborate the current trends in data mining.[Marks 6+5+5]

TAGORE ENGG COLLEGE DEPT OF INFORMATION TECHNOLOGY SUB CODE : CS1004 SUBJECT : DATA WAREHOUSING AND MINING PART A (Answer all the questions) 1. Write down the applications of data warehousing. 2. When is data mart appropriate? 3. What is concept hierarchy? give an example. 4. What are the various forms of data preprocessing? 5. Write the two measures of Association Rule. 6. Define conditional pattern base. 7. List out the major strength of decision tree method. 8. Distinguish between classification and clustering. 9. Define a spatial database. 10. list out any two various commercial data mining tools. PART-B 11.(a) (i) With a neat sketch explain the architecture of a data warehouse (ii) Discuss the typical OLAP operations with an example. Or (b) (i) Discuss how computations can be performed efficiently on data cubes. (ii) Write short notes on data warehouse meta data. 12.(a) (i) Explain various methods of data cleaning in detail. (ii) Give an account on data mining Query language. Or (b) How is Attribute-Oriented Induction implemented? Explain in detail. 13. (a) Write and explain the algorithm for mining frequent item sets without candidate generation. Give relevant example. Or (b) Discuss the approaches for mining multi level association rules from the transactional databases. Give relevant example. 14. (a) Consider the following training dataset and the original decision tree induction algorithm (ID3). Risk is the class label attribute. The Height values have been already discretized into disjoint ranges. Calculate the information gain if Gender is chosen as the test attribute. Calculate the DATE :

information gain if Height is chosen as the test attribute. Draw the final decision tree (without any pruning) for the training dataset. Generate all the IF-THEN rules from the decision tree. Gender Height Risk F (1.5, 1.6) Low M (1.9, 2.0) High F (1.8, 1.9) Medium F (1.8, 1.9) Medium F (1.6, 1.7) Low M (1.8, 1.9) Medium F (1.5, 1.6) Low M (1.6, 1.7) Low M (2.0, 8) High M (2.0, 8) High F (1.7, 1.8) Medium M (1.9, 2.0) Medium F (1.8, 1.9) Medium F (1.7, 1.8) Medium F (1.7, 1.8) Medium Or (b)Explain the following clustering methods in detail: (i) BIRCH (ii) CURE 15.(a) (i) What is a multimedia database? Explain the methods of mining multimedia database? Or (b) (i) Discuss the social impacts of data mining. (ii) Discuss spatial data mining.

TAGORE ENGG COLLEGE DEPT OF INFORMATION TECHNOLOGY MODEL EXAM (Common to IV Yr IT A and B) SUB CODE : CS1004 SUBJECT : DATA WAREHOUSING AND MINING PART A (Answer all the questions) 1. Write down the applications of data warehousing. 2. When is data mart appropriate? 3. What is concept hierarchy? give an example. 4. What are the various forms of data preprocessing? 5. Write the two measures of Association Rule. 6. Define conditional pattern base. 7. List out the major strength of decision tree method. 8. Distinguish between classification and clustering. 9. Define a spatial database. 10. list out any two various commercial data mining tools. PART-B 11.(a) (i) With a neat sketch explain the architecture of a data warehouse (ii) Discuss the typical OLAP operations with an example. Or (b) (i) Discuss how computations can be performed efficiently on data cubes. (ii) Write short notes on data warehouse meta data. 12.(a) (i) Explain various methods of data cleaning in detail. (ii) Give an account on data mining Query language. Or (b) How is Attribute-Oriented Induction implemented? Explain in detail. 13. (a) Write and explain the algorithm for mining frequent item sets without candidate generation. Give relevant example. Or (b) Discuss the approaches for mining multi level association rules from the transactional databases. Give relevant example. DATE :

14. (a) (b) Consider the following training dataset and the original decision tree induction algorithm (ID3). Risk is the class label attribute. The Height values have been already discretized into disjoint ranges. Calculate the information gain if Gender is chosen as the test attribute. Calculate the information gain if Height is chosen as the test attribute. Draw the final decision tree (without any pruning) for the training dataset. Generate all the IF-THEN rules from the decision tree. Gender Height Risk F (1.5, 1.6) Low M (1.9, 2.0) High F (1.8, 1.9) Medium F (1.8, 1.9) Medium F (1.6, 1.7) Low M (1.8, 1.9) Medium F (1.5, 1.6) Low M (1.6, 1.7) Low M (2.0, 8) High M (2.0, 8) High F (1.7, 1.8) Medium M (1.9, 2.0) Medium F (1.8, 1.9) Medium F (1.7, 1.8) Medium F (1.7, 1.8) Medium (Or (b)Explain the following clustering methods in detail: (i) BIRCH (ii) CURE 15.(a) (i) What is a multimedia database? Explain the methods of mining multimedia database? Or (b) (i) Discuss the social impacts of data mining. (ii) Discuss spatial data mining.

You might also like