You are on page 1of 28

School of Computing Science and Engineering

M. Tech. Computer Science and Engineering with Specialization in


Big Data Analytics

Curriculum 2014-15 Batch onwards

University Core

Course Code Course Title L T P C


ENG601 Professional and Communication Skills 1 0 2 2
MAT503 Higher Mathematics 3 1 0 4
Total Credits 6

University Elective

Course Code Course Title L T P C


University Elective 1 3 0 0 3
Total Credits 3

Programme Core

Course Code Course Title L T P C


CSE503 Advanced Algorithmic Analysis 3 0 2 4
CSE507 Advanced Computer Architecture 3 0 2 4
CSE509 Advanced Operating Systems 3 0 2 4
CSE515 Advanced Database Systems 3 1 0 4
CSE504 Advanced Computer Graphics 3 0 0 3
CSE506 Web Services 3 0 2 4
CSE510 Distributed Systems 3 0 0 3
CSE511 Advanced Computer Networks 3 0 2 4
CSE502 Advanced Software Engineering 3 0 0 3
CSE699 Master’s Thesis 0 0 0 14
Total Credits 47

Proceedings of the 29th Academic Council [26.4.2013] 423


SET Project & Paper Presentation
Course Code Course Title L T P C
SET 501 Science, Engineering and Technology Project - I 2
SET 502 Science, Engineering and Technology Project – II 2
SET 503 Science, Engineering and Technology Project - III 2

Total Credits: 6
Programme Elective

Course Code Course Title L T P C


CSE609 Information Retrieval & Data Mining 3 0 2 4
CSE610 Big Data Essentials 3 0 2 4
CSE611 Machine Learning 3 0 2 4
CSE612 Big Data Analytics 3 0 2 4
CSE613 Data Visualization 3 0 2 4
CSE614 Large-Scale Data Management Techniques 2 0 0 2
Total Credits to be taken (4 Courses) 14

Proceedings of the 29th Academic Council [26.4.2013] 424


SET Project Evaluation Procedure

Component Percentage
Paper Presentation 25%
Guide 50%
SET Evaluation 25%
Committee

Course Evaluation

Component Percentage
Mid Term Exam
Other Internal Assessment 55%
(Assignments/Projects/Seminars)
Term End Exam 45%
CSE503 Advanced Algorithmic Analysis L T P C
3 0 2 4
Version No. : 2.00

Course
Prerequisites:
Objective
To focus on design and analysis of algorithms in various domains that lays foundations for
designing efficient algorithms.
Expected Outcome
On completion of this course the student would be able to
- Apply the algorithms and design techniques to solve problems
- Have a sense of the complexities of various problems in different domains.
Unit No. I Introduction 9 hours
Overview of algorithmic design, asymptotic notation and its properties, Growth of Functions,
Time complexity and Analysis of algorithms, Recurrence Relations, Amortized analysis.
Unit No. II Linear Programming 9 hours
Geometry, Farkas' Lemma, Strong Duality, Complexity, Interior-point Algorithms, Ellipsoid
Algorithm and Optimization vs. Separation, Extension to Conic Programming.

Unit No. III Network Flows 9 hours


Maximum Flows, Min-cost Flows, Cycle Cancelling Algorithms, Strongly Polynomial-time
Analysis, Minimum Cuts without Flows

Unit No. IV P and NP Classes 9 hours


Class P, Polynomial time verification, reducibility, NP-Hard, NP-completeness, Cooks theorem,
NP-complete problems− Circuitsat, 3Sat-CNF, Clique, vertex-cover and subset sum.

Unit No. V Approximation Algorithms 9 hours


Limits to Approximability, Basic Techniques and Vertex Cover, Primal-dual Technique, Set
cover problem,Multicommodity Cut via Embedding Metric Spaces, Approximation Scheme for
Euclidean TSP
Text / Reference Books
1. Cormen, Leiserson, Rivest and Stein, “Introduction to Algorithms”, 3rd edition, McGraw-
Hill, 2009.
2. E. Horowitz, and S. Sahni, “Fundamentals of Computer Algorithms”, 2nd edition, Computer
Science Press, 2008.
3. Schrijver, A. “Theory of Linear and Integer Programming” Chichester: John Wiley & Sons,
1998.
4. Roos, C., T. Terlaky, and J. -Ph. Vial. “Theory and Algorithms for Linear Optimization: An
Interior Point Approach” Chichester: John Wiley & Sons, 1997.
5. Vazirani, V. “Approximation Algorithms” Berlin: Springer-Verlag, 2001.

Mode of Evaluation : Tests, Assignments and Seminar.

Recommended by the Board of Studies on:

Proceedings of the 29th Academic Council [26.4.2013] 402


CSE507 Advanced Computer Architecture L T P C
3 0 2 4
Version : 2.10

Course
Prerequisites : -
Objective
To focus towards the various design options in the area of architecture that lays platform to
develop and analyze high performance applications.

Expected Outcome
On completion of this course the student would be able to
- Identify the need for multi-core architecture for specific applications by developing a suitable
complexity measure.
- Identify needs for homogeneous or heterogeneous multi-core architectures for a given
application.
- Develop methods to partition a given application program to run on a multi-core processor
- Use the Intel multi-core architecture for develop high performance code
- Optimize code using appropriate techniques.

Unit No. I Control Unit Design 9 hours


Overview of IAS Computer, Data path implementation, Register Transfer Notation (RTN),
Abstract RTN, Concrete RTN, Control sequence for Simple RISC computer (SRC); Control unit
Design, Hardwired control unit Design and Micro programmed control unit Design using
control Sequences.

Unit No. II Memory Module Design 9 hours


Conceptual view of memory cell, Memory address map, Memory connections to CPU, Cache
memory − Cache memory management techniques, Types of cache’s : Look through, look aside,
write through , write around, unified Vs Split, multilevel, cache levels, Cache Misses,
performance issues: Mean memory access time, Execution time, Cache Coherence Protocols,
Snoopy, MSI, MESI, and MOESI.

Unit No. III Instruction and Thread level parallelism 9 hours


Instruction level parallelism (ILP): Concepts - Dynamic scheduling and data hazards - Exploiting
ILP using static/dynamic scheduling and speculation -Advanced techniques for instruction
delivery and speculation. Thread level parallelism: Shared memory architectures centralized,
distributed Synchronisation in memory, Models for memory consistency.

Unit No. IV Multi-Core and Multithreading Concepts 9 hours


Multi-processor architecture and its limitations, Need for multi-core architectures, Architecting
with multi-cores, Homogenous and heterogeneous cores, Shared recourses, shared busses, and
optimal resource sharing strategies Evolution of Multi-Core Technology, basic concepts of
threading and parallel computing, Concurrency, Parallelism, threading design concepts for
developing an application, Correctness Concepts, Performance Concepts: Simple Speedup,
Computing Speedup, Efficiency , Granularity , Load Balance, Tools Foundation – Intel®
Compiler and Intel® VTune™ Performance Analyzer.

Proceedings of the 29th Academic Council [26.4.2013] 403


Unit No. V Multi-Core Programming 9 hours
Introduction to OpenMP, OpenMP Directives, Parallel constructs, Work-sharing constructs,
Data environment constructs, Synchronization constructs, Extensive API library for finer
control, benchmarking multi-core architecture: Bench marking of processors. Comparison of
processor performance for specific application domains.

Text / Reference Books


1. John L. Hennessy and David A. Patterson “Quantative Approach –Computer Architecture”
5th edition, Morgan Kaufmann, 2011.
2. Shameem Akhter and Jason Roberts, “Multi-Core Programming”, 1st edition, Intel Press,
2006.
3. Vincent P. Heuring, Harry F. Jordan “Computer System design and Architecture” 2nd
edition, Pearson, 2003.
4. David B. Kirk , Wen-mei W. Hwu, “Programming Massively Parallel Processors: A Hands-
on Approach
(Applications of GPU Computing Series)”, 1st edition, Morgan Kaufmann, 2010.
5. Apman, Gabriele Jost, Ruud van van der Pas, “Using OpenMP: Portable Shared Memory
Parallel Programming (Scientific and Engineering Computation)”, 1st edition, MIT Press,
2007.

Mode of Evaluation: Tests, Assignments and Seminar.

Recommended by the Board of Studies on:

Proceedings of the 29th Academic Council [26.4.2013] 404


CSE509 Advanced Operating Systems L T P C
3 0 2 4
Version No. : 2.10
Course
Prerequisites :

Objective
To provide the fundamental principles of modern operating systems that explores design aspects
of modern operating systems.

Expected Outcome
On completion of this course the student should be able to understand and evaluate operating
system implementations, Develop system software modules, Write and debug concurrent
programs, Debug complex systems and low-level software and Work with distributed and real
time OS.

Unit No. I Operating System Overview 9 hours


Objectives and functions – Evolution of Operating System – Major Achievement – Modern
Operating System – Microsoft Windows overview – Unix system – Modern Unix System.

Unit No. II Process Management 9 hours


Introduction – Suspension and Resumption – Self suspension and Information Hiding – System
call template – Implementation of suspend – Process Creation and Termination. Coordination of
Concurrent Process Avoidance of Busy waiting – Semaphore Policy and Process Selection –
Semaphore Data Structure – Static and Dynamic Semaphore Allocation – Semaphore Deletion
and Reset.

Unit No. III Inter Process Communication 9 hours


Inter Process Communication and Process Synchronization – Inter Process Communication
Ports – Implementation of Port – Port Table Initialization – Port Creation – Sending a Message
to Port – Receiving a message from Port – Port Deletion and Reset. Process Synchronization –
Classified Synchronized problem – Synchronization solution – Dead lock prevention –
Avoidance.

Unit No. IV Memory and File Management 9 hours


Memory Management - Introduction – Partitioned Space Allocation – Buffer Pools - Allocation
a Buffer – Return a Buffer – Creating a Buffer Pool – Initializing the Buffer Pool Table – Virtual
Memory and Memory multiplying Hardware for Demand Paging – Address Translation with a
Page Table – Metadata in Page Table entry – Page replacement and Global Clock. File
Management - Operating Systems – Internal and File Management – The Intel Architecture –
MS-DOS internal – Windows XP Internals – UNIX and UNIX internals.

Unit No. V Distributed Operating Systems 9 hours


Distributed operating system concept – Architectures of Distributed Systems, Distributed
Mutual Exclusion, Distributed Deadlock detection, Agreement protocols, Threads, processor
Allocation, Allocation algorithms, Distributed File system design; NFS and AFS. Real Time
Operating Systems: Introduction to Real Time Operating Systems, Concepts of scheduling, Real
time Memory Management.

Proceedings of the 29th Academic Council [26.4.2013] 405


Text / Reference Books
1. Davis, Davis William S, “Operating Systems: A Systematic View”, 6th edition, Pearson
Education India, 2007.
2. Douglas Comer, “Operating System Design: The Xinu Approach, Linksys Version”, 2nd
edition, CRC Press, 2011.
3. Ann McIver McHoes, Ida M. Flynn, “Understanding Operating Systems”, 6th Edition,
Cengage Learning, 2010.

Mode of Evaluation : Tests, Assignments and Seminar.


Recommended by the Board of Studies on:

Proceedings of the 29th Academic Council [26.4.2013] 406


CSE515 Advanced Database Systems L T P C
3 1 0 4
Version No. : 2.10

Course
Prerequisites
Objective
To expose the students to the latest industry relevant topics in modern database management
systems.
Expected Outcome
To enable the students to design their own parallel and distributed databases and to expose to
the various warehousing tools.
Unit No. I Database Design And Tuning 6 hours
Introduction to physical database design – Guideline for index selection − Overview of
database tuning – Conceptual schema tuning – Queries and view tuning, Limitations of RDBMS,
Query Optimization, NoSQL, transaction model.
Unit No. II Parallel And Distributed Database 12 hours
Parallel database systems: Architecture of parallel databases, Parallel query evaluation,
parallelizing joins and parallel − query optimization. Distributed database systems: Distributed
database architecture, Properties of distributed database, Types of distributed database, storing
data in a distributed DBMS, distributed query processing, Database Concurrency control
protocols. Transaction failure and Recovery, Database recovery protocol.
Unit No. III Deductive Databases 9 hours
Introduction, Prolog/datalog notation, Interpretation of rules, Basic inference mechanisms for
logic programs, Datalog programs and their evaluation, deductive database system, deductive
object oriented databases, applications.
Unit No. IV Data Warehousing 9 hours
Data warehousing: Characteristics of Data warehouse, Data preprocessing, Data warehouse
architecture, Multi dimensional data model, Schema design, OLAP Operation and Data mart,
Concepts of Data mining.
Unit No. V Database Technologies Use Case 9 hours
Object Database Systems, Multimedia databases, Mobile databases, Spatial Database, Temporal
database, Data bases on the World Wide Web, Geographic Information system, Genome data
management, Digital Libraries.
Text / Reference Books
1. Raghu Ramakrishnan and Johannes Gehrke, “Database Management Systems”, 3rd Edition,
McGraw Hill, 2007.
2. S.K.Singh, “Database Systems: Concepts, Design & Applications”, 1st edition, Prentice Hall,
2009.
3. Ramez Elmasri and B.Navathe, “Fundamentals of database systems”, 4th edition, Addison
Wesley, 2008.
4. Jiawei Han and Micheline Kamber, “Data Mining Concepts and Techniques”, 2nd edition,
Morgan Kaufmann publishers, 2011.
5. Gerhard Weikum, Gottfried Vossen, “Transactional Information Systems: Theory,
Algorithms and Practice of Concurrency Control and Recovery, Morgan Koufmann, 2002.

Mode of Evaluation : Tests, Assignments and Seminar


Recommended by the Board of Studies on:

Proceedings of the 29th Academic Council [26.4.2013] 407


CSE504 Advanced Computer Graphics L T P C
3 0 0 3
Version No. :1.20

Course :
Prerequisites

Objective
To provide the basic concepts that underlay all graphics applications like computer games,
movies, medicine and information visualization and that will serve the foundations into the areas
of high quality image generation and interactive graphics.

Expected Outcome
On completion of this course the student would be able to deal latest concepts introduced in 3D
cards architecture.
Unit No. I Introduction 9 hours
Overview, Modeling, Procedural Models, Fractal Models, and Grammar based models, particle
systems, and viewing, Rasterization and Ray tracing

Unit No. II Illumination 9 hours


Vertex/Geometry/Pixel programming, Illumination mode, specular reflection model, shading
models for curve surfaces, Radiosity method, Rendering, Recursive ray tracing, Texture mapping

Unit No. III Graphics Hardware 9 hours


Graphics hardware architecture, Object representation and levels of detail.

Unit No. IV Surface Rendering 9 hours


Parametric and implicit surfaces, Meshing, Visibility and shadow computation, Global
illumination.

Unit No. V Visualization Techniques 9 hours


Introduction to volume visualization, Introduction to animation, Image based rendering, Filler.

Text / Reference Books


1. Watt A. and M. Watt, “Advanced Animation and Rendering Techniques Theory and
Practice”, 1st edition, Addison-Wesley, 1994.
2. Hearn D. and P. Baker, “Computer Graphics - C Version”, 2nd edition, Prentice-Hall, 2004.
3. Neider, J., T. Davis, and M. Woo, “OpenGL Programming Guide”, 3rd edition, Addison-
Wesley, 1999.
4. Blinn J., “A Trip down the Graphics Pipeline. Jim Blinn's Corner”, 2nd edition, Morgan
Kaufmann publishers, 1996.
5. Luebke D., M. Reddy, J. Cohen, A. Varshney, B. Watson, R. Huebner, “Level of Detail for
3D Graphics”, 2nd edition, Morgan-Kaufman publishers, 2003.

Mode of Evaluation : Tests, Assignments and Seminar


Recommended by the Board of Studies on:

Proceedings of the 29th Academic Council [26.4.2013] 408


CSE506 Web Services L T P C
3 0 2 4
Version : 2.10
Course
Prerequisites
Objective
To provide fundamentals on SOA, SOAP UDDI and XML that lays foundations for the
advanced studies in the area of web services
Expected Outcome
After completion of this course the students will be perform project in the area of XML
Unit No. I SOA: (Service Oriented Architecture) 9 hours
Introduction to Services - Bind, Publish, Find - Framework for SOA – Web Services
Architecture, Interoperability – RESTful (Representational State Transfer) Services, WS −
Interoperability, JSON, RESTLETS, Ruby on Rails, Java Server Faces, Hibernate.
Unit No. II XML & Web Service Standards 9 hours
Basics of XML –XML standards − SOAP − Messaging, Encoding, Faults, Data types, WS-
Routing, WSDL Specification − UDDI Business Registry − UDDI data Models, Types, Inquiry
and Publisher APIs.
Unit No. III From Web Services To Semantic Web Services 9 hours
Introduction to semantic web services −Resource Description Framework: RDF − Basic
elements, Classes and Properties − RDF query, RDF tools, RDF − Semantics.
Unit No. IV Ontology Basics, Web Ontology Language 9 hours
OWL, sub languages − OWL: Lite, DL, Full. Instance, Classes, Properties, DataType Properties,
Object Properties, Operators − OWL-S: An upper ontology to describe web services, Building
blocks, Validating OWL- S documents.
Unit No. V Real World Examples & Applications 9 hours
Protégé-OWL, Case Study, Swoogle, Architecture and usage of meta-data, FOAF(Friend Of A
Friend), Semantic markup, RSS, feeds, semantic web search engines, Web Crawler, mashups with
Examples.

Text / Reference Books


1. Sanjiva Weerawarana, Francisco Curbera, Frank Leymann, Tony Storey, Donals F. Ferguson,
“Web Services Platform Architecture: SOAP, WSDL, WS-Policy, WS-Addressing, WS-
BPEL, WS-Reliable Messaging and More”, 2nd edition, Prentice Hall PRT, 2005.
2. Liyang Yu, “Introduction to the Semantic Web and Semantic Web Services”, 1st edition,
Chapman & Hall/CRC, 2007.
3. John Hebeler, Matthew Fisher, Ryan Blace, Andrew Perez-Lopez, Mike Dean, “Semantic
web programming", 3rd edition, Wiley Publishing Inc, 2009.
4. Grigoris Antoniou and Frank van Harmelen, “A Semantic Web Primer”, 2nd Ed., MIT
Press, 2008.

Mode of Evaluation : Tests, Assignments and Seminar.

Recommended by the Board of Studies on:

Proceedings of the 29th Academic Council [26.4.2013] 409


CSE510 Distributed Systems L T P C
3 0 0 3
Version : 1.20
Course :
Prerequisites
Objective
To provide the fundamentals for the distributed systems that serve foundation for the advanced
studies in the area of distributed systems.
Expected Outcome
On completion of this course the student would be able to deal distributed databases, file
systems and also know about the languages for distributed systems.
Unit No. I Introduction 9 hours
Fundamental issues in distributed systems, Distributed System Models and Architectures,
Classification of Failures in Distributed Systems, Basic Techniques for Handling Faults in
Distributed Systems, Homogenous and Heterogeneous nodes.
Unit No. II Time And Global States 9 hours
Logical clocks and physical clocks, events, process states, global states; Distributed Mutual
Exclusion, Leader Election, Distributed Deadlock Detection, Remote Procedure Calls, Broadcast
Protocols.
Unit No. III Naming in Distributed Systems 9 hours
Name services and the DNS- Directory Services-X 500 protocol; Distributed File System and
implementation; coordination and agreement, Remote Method Invocation (RMI).
Unit No. IV Transactions and Concurrency Control 9 hours
Distributed transaction-concurrency control-transaction recovery; replication-transaction with
replication; Distributed Shared Memory, distributed Mutual Exclusion, Google File system.
Unit No. V Mobile and Ubiquitos Computing 9 hours
Context aware computing; web services; distributed coordination of services; case study on
CORBA
Text / Reference Books
1. Randy Chow and Theodore Johnson, “Distributed Operating Systems and Algorithms”,
Addison- Wesley, 1997.
2. G. Coulouris, J. Dollimore, and T. Kindberg, “Distributed Systems: Concepts and Designs”,
5th edition, Addison Wesley, 2011.
3. Mukesh Singhal, and N. G. Shivaratri, “Advanced Concepts in Operating Systems,
Distributed, Database, and Multiprocessor Operating Systems”, 1st edition, McGraw Hill,
1994.
4. Vijay K. Garg, “Elements of Distributed Computing”, 1st edition, Wiley & Sons, 2002.
5. Relevant papers from various IEEE and ACM Transactions/Journals and Conference
Proceedings.

Mode of Evaluation : Tests, Assignments and Seminar.


Recommended by the Board of Studies on:

Proceedings of the 29th Academic Council [26.4.2013] 410


CSE511 Advanced Computer Networks L T P C
3 0 2 4
Version : 2.01

Course :
Prerequisites

Objective
To go beyond the basic level of understanding that is typically offered at an undergraduate
networking course.
Expected Outcome
On completion of course students will be able to understand the fundamental concepts in
routing and addressing, transport protocols and congestion control, emerging distributed
applications, and wireless networking.
Unit No. I Networking Standards And Specification 9 hours
Networking standards and specifications, Need for standardization, ISO and the IEEE
standards, The IEEE 802 Project
Unit No. II Overview of OSI and TCP/IP Protocol Suite 9 hours
Layers in the OSI model, TCP/IP protocol suite, Physical layer addressing, Network layer
addressing, Client-Server model.
Unit No. III Addressing And Routing 9 hours
IP Addresses: Classful addressing, Subnetting/Supernetting, Classless Addressing, Delivery and
routing of IP packets, Interior and Exterior routing.
Unit No. IV TCP/IP Protocol Suite 9 hours
Socket Interface, Internet Protocol (IP), ICMP and ARP, Transport Layer Protocols − TCP and
UDP, Congestion control and Quality of Service, File Transfer protocols − FTP and TFTP,
SMTP, SNMP, BOOTP and DHCP, Domain Name System, Mobile IP. Routing protocols −
RIP, OSPF, BGP.
Unit No. V Ad Hoc Wireless Networks 9 hours
Cellular and Ad hoc wireless networks, Applications of Ad hoc wireless networks, issues in ad
hoc wireless networks, issues in designing a routing protocol for ad hoc wireless networks,
Classification of routing protocols, Security in ad hoc wireless networks.

Text / Reference Books


1. Behrouz A. Forouzan, “TCP/IP Protocol Suite”, 4th edition, Tata McGraw-Hill, 2010.
2. W. Richard Stevens, “TCP/IP Illustrated, The Protocols”, 2nd edition, Pearson Education,
2011.
3. C. Siva Ram Murthy, B.S. Manoj, “Ad hoc Networks-Architectures and protocols”, 3rd
edition, Pearson Education, 2007.
4. Andrew S. Tenenbaum, “Computer Networks” 4th edition, Prentice Hall, 2011.
5. D. E. Comer, “Internetworking with TCP/IP Principles, Protocols and Architecture”,
Volume - I, Pearson Education, 2009.

Mode of Evaluation : Tests, Assignments and Seminar.

Recommended by the Board of Studies on:

Proceedings of the 29th Academic Council [26.4.2013] 411


Course Code ADVANCED SOFTWARE ENGINEERING LTPC 3 0 0 3
CSE 502
Version No.
Course Prerequisite Advanced Algorithmic Analysis
/ Exposure:
Objective

To focus on the state-of-the-art in various areas of Software Engineering


Expected Outcome
On completion of this course the student would be able to deal software project, configuration and risk management. This helps
the students to perform software quality assurance parameters for software quality.
Unit I Introduction to Software Engineering
Software Process Models: Waterfall, V-model, Spiral, iterative &incremental - Introduction to Agile Software Development,
Agile Principles and Practices, Extreme Programming.
Software Requirements Engineering, Software Architecture: Architectural Tactics and Patterns - Architecture in the Life Cycle:
Architecture and Requirements – Designing Architecture – Implementation and Testing – Case Studies: Air Traffic
Control, Flight Simulation,Object Oriented Design, Design principles, OOD metrics, Software Refactoring, Principles of
Refactoring, Bad Code Smells, and Case Study.

Unit II Software Testing & Maintenance


Introduction to Software Testing, levels of testing, types of testing, Black box design techniques, White box design techniques-
statement coverage, decision coverage, condition coverage, Static Review process. Software Maintenance ,Software Project
Management: Planning & Estimation, Software Configuration Management

Unit III Software Reuse


Reuse-based Software Engineering – Approaches supporting software reuse – Application Frameworks – Commercial-Of-The-
Shelf (COTS) systems: COTS Solution Systems, COTS Integrated Systems.
Component-Based Software Engineering (CBSE) – Components, Component Models – CBSE Processes: CBSE for Reuse,
CBSE with Reuse – Component-based Development: Component Qualification, Adaptation, and Composition – Economics of
CBSE.

Unit IV Distributed Software Engineering


Distributed Software Engineering – Distributed system characteristics – Design Issues –Middleware – Client-Server Computing
– Client-Server Interaction – Architectural patterns for Distributed Systems: Master/Slave, Two-tier, Multi-tier, Distributed
component, and Peer-to-Peer – Software as a Service (SaaS) – Key elements – Implementation factors – Configuration of a
system offered as a service.
Service-Oriented Architecture (SOA) – Difference between SaaS and SOA - Benefits of SOA – Key Standards - RESTful web
services – Service-based Information Systems – Service-Oriented Software Engineering: Services as reusable components –
Service Engineering: Service Candidate Identification, Service Interface Design, Service Implementation and Deployment,
Legacy system services - Software Development with services: Workflow design and implementation, Service testing.

Unit V Aspect Oriented Software Development


Introduction to Aspect-Oriented Software Development (AOSD): Separation of Concerns, Aspects, Join Points, and Pointcuts –
Comparison with Object-Oriented Software Development – Aspect-Orientation in the Software Lifecycle – Concern-Oriented
Requirements Engineering – Aspect-Oriented Design and Development – Expressing Aspects Using UML Behavioral and
Structural Diagrams – Concern Modeling – Developing Software Components with Aspects – Examples of Aspect-Oriented
Systems
Text / Reference Books

Software Engineering, 9th Edition, by Ian Sommerville, Addision-Wesley, 2010.


2. Software Architecture in Practice, 3rd Edition, by Len Bass, Paul Clements, Rick Kazman, Addison- Wesley Professional,
2012 (SEI Series in Software Engineering).
3. Software Engineering: A Practitioner's Approach, 7th Edition, by Roger Pressman, McGraw-Hill,
2010.
4. Aspect-Oriented Software Development by Robert E. Filman, Tzilla Elrad, Siobhán Clarke, Mehmet
Aksit, Addison-Wesley Professional, 2004.
5. Refactoring: Improving the design of existing code by Martin Fowler, Addison Wesley, 1999.
6. Agile Software Development, Principles, Patterns, and Practices by Robert C. Martin, Pearson, 2011.

Mode of Evaluation Written examinations, assignments and mini projects

Recommended by the
Board of Studies on
Date of Approval by
the Academic
Council
CSE609 Information Retrieval and Data Mining L T P C
3 0 2 4

Version No. : 1.00


Course : Advanced Database Systems
Prerequisite /
Exposure:

Objective
To provide the fundamentals on information retrieval and data mining techniques and focus on
practical algorithms of textual document indexing, relevance ranking, web usage mining, text
analytics, as well as their performance evaluations, that lays foundations for the Data Analytics.

Expected Outcome
On completion of this course students are expected to master both the theoretical and practical
aspects of information retrieval and data mining. More specifically, the student will understand:
1. The basic concepts and processes of information retrieval systems and data mining techniques.
2. The common algorithms and techniques for information retrieval.
3. The quantitative evaluation methods for the IR systems and data mining techniques.
4. The popular probabilistic retrieval methods and ranking principle.

Unit No. I Overview of Information retrieval and Data mining 8 hours


Introduction to information retrieval and Data Mining – Data Mining Functionalities, Steps in
Data Mining Process – Architecture of a Typical Data Mining Systems. Understand the
conceptual models of an information retrieval and knowledge discovery system. Indexing
techniques for textual information items - inverted indices, tokenization, stemming and stop
words.

Unit No. II Mining Association Rules 9 hours


Mining Association Rules in Large Databases, Mining Frequent Patterns - basic concepts -
Efficient and scalable frequent item set mining methods, Apriori algorithm, FP-Growth
algorithm, Associations - mining various kinds of association rules.

Unit No. III Predictive Modeling Clustering, Analytics and 10 hours


statistical modeling
Classification and Prediction Issues, Classification by Decision Tree Induction – Classification
methods – Other Classification Methods – Prediction –
Clusters Analysis – Basics of cluster analysis -Types of Data in Cluster Analysis – Categorization
of Major Clustering Methods – Partitioning Methods – Hierarchical Methods.
Case Study: Data analytics using -Naïve Bayesian Classifier - K-Means Clustering - Association
Rules, Decision Trees -Linear and Logistic Regression -Time Series Analysis -Text Analytics.

Unit No. IV Retrieval Methods and Evaluation 10 hours


Retrieval models- Boolean, Vector space, Binary independence, Language modeling. Probability
ranking principle. Other commonly-used techniques include relevance feedback, pseudo
relevance feedback, and query expansion. Retrieval Performance Evaluation measures: Average
precision, Normalized Discounted Cumulative Gain (NDCG), etc., Cranfield paradigm.

Proceedings of the 29th Academic Council [26.4.2013] 425


Unit No. V Personalisation and Emerging Areas 8 hours
Basic techniques for collaborative filtering and recommender systems - memory-based
approaches, probabilistic latent semantic analysis (PLSA), and personalized web search- click-
through data. Peer-to-peer information retrieval, Learning to Rank Portfolio retrieval and Risk
Management.
Text / Reference Books
1. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, “Introduction to
Information Retrieval”, Cambridge University Press. 2008.
2. Pang-Ning Tan, Michael Steinbach and Vipin Kumar, “Introduction to Data Mining”,
Addison-Wesley, 2006.
3. Jiawei Han and Micheline Kambers, “Data Mining –Concepts and Techniques”, 3rd edition,
Morgan Kaufman Publications, 2011.
4. David Hand, Heikki Mannila and Prdhraic Smyth, “Principles of Data Mining”, 3 rd edition,
Morgan Kaufman Publications, 2009.
5. M. Kantardzic, “Data Mining: Concepts, Models, Methods, and Algorithms”, 2nd edition,
Wiley-IEEE Press, 2011.

Mode of Evaluation : Tests, Assignments and Seminar.


Recommended by the Board of Studies on:

Information Retrieval and Data Mining


LABORATORY

S. No. Indicative List of Experiments (in the areas of)


1 Implementing various indexing techniques with respect to textual information.

2 Discovering knowledge patterns through Indexing techniques for textual information


items
3 Implementation of scalable frequent item set mining with Market Basket Analysis
(MBA)data sets using Apriori algorithm
4 Implementation of scalable frequent item set mining with Market Basket Analysis
(MBA)data sets using FP-Growth algorithm
5 Prediction of consumer behaviours using association rule mining.
6 Applying Predictive Modeling Clustering for analyzing the census data
7 Applying Analytics and statistical modeling for predicting the nature of buyer and
seller behaviours
8 Relevance feedback analysis using Probability ranking principle
9 Retrieval Performance Evaluation measures using Average precision, Normalized
Discounted Cumulative Gain (NDCG)
10 Rank Portfolio retrieval and analysis

Proceedings of the 29th Academic Council [26.4.2013] 426


CSE610 Big Data Essentials L T P C
3 0 2 4

Version No. : 1.00

Course : Advanced Database Systems


Prerequisites/
Exposure

Objectives
This course provides a broad introduction to big data at a foundation level with a focus on big
data technology and tools, including MapReduce and Hadoop. That serves foundation for the
advanced studies in the area of Big Data Analytics.

Expected Outcome
On completion of this course, the students will be able to handle huge volumes of data untapped
by the BI programs. They come to know about the Analytics Life Cycle. They get knowledge of
open source software framework that supports the processing of large data sets.

Unit 1 Introduction 7 hours


Big Data Overview - State of the practice in analytics - The role of the Data Scientist - Big Data
Analytics in Industry Verticals, Big data sources.

Unit II Data Analytics Lifecycle 7 hours


Key roles for a successful analytic project - Main phases of the lifecycle - Developing core
deliverables for stakeholders.

Unit III Big Data – Technology 9 hours


Introduction to MapReduce/Hadoop for analyzing unstructured data-design patterns-Filtering
Patterns-Join Patterns-Meta Patterns - Hadoop ecosystem of tools - In-database Analytics -
MADlib and Advanced SQL Techniques, NoSQL, JSON store, MDX.

Unit No. IV Overview of Hadoop 12 hours


Introduction to learning and knowledge analytics- Rise of “Big Data” -Big Data From
Technology Perspective- Hadoop: Components of Hadoop, Application Development in
Hadoop , The Distributed File System: HDFS, GPFS , Hadoop Cluster Architecture, Batch
Processing-Low Latency NoSQL.

Unit No. V MapReduce Algorithm Design 10 hours


MapReduce Basics - Functional Programming Roots - Mappers and Reducers -The Execution
Framework - Partitioners and Combiners- MapReduce Algorithm Design- Local Aggregation-
Pairs and Stripes- Computing Relative Frequencies - Secondary Sorting- Relational Joins.

Text Books /Reference Books


1. Noreen Burlingame ,”Little Book of Big Data” Ed. 2012
2. Tom White, “Hadoop , the definitive guide”, O'Reilly Media, 2010
3. Alex Holmes, “Hadoop in practice”, Manning Publications, 2012
4. Donald Miner, “Map Reduce Design Patterns: Building Effective Algorithms and Analytics
for Hadoop and Other Systems”, O'Reilly Media, 2012
5. Nathan Marz , “Big Data: Principles and best practices of scalable real-time data systems”,
Manning Publications, 2012

Proceedings of the 29th Academic Council [26.4.2013] 427


6. Big Data Now: Current Perspectives, O’Reilly Radar [kindle Edition], 2011.
7. Paul Zikopoulos et al., “Harness the Power of Big Data The IBM Big Data Platform”,
McGraw-Hill, 2013.

Mode of Evaluation:
Recommended by the Board of Studies on :

Big Data Essentials Laboratory

S. No. Indicative List of Experiments (in the areas of)


1 Design and development of data analytics in large scale data set
2 Design and deployment of Data Analytics in Industry Verticals using Big data source
3 Analyzing unstructured data using Hadoop
4 Deploying MapReduce algorithms for large set of data
5 Deploying database analytics using NoSQL
6 Deployment of Distributed file systems- HDFS and GPFS
7 Implementation of Batch processing using Hadoop
8 Deployment of Hadoop clustering process
9 Real time application development using Hadoop

Proceedings of the 29th Academic Council [26.4.2013] 428


CSE611 Machine Learning L T P C
3 0 2 4

Version No. : 1.00

Objective:
To introduce how to mimic learning through algorithms. To learn from data in a supervised /
unsupervised manner so as to facilitate decision making.

Expected Outcome
On completion of this course the student would be able to:
 Understand the principles, advantages, limitations and possible applications of machine
learning.
 Identify and apply the appropriate machine learning technique to classification, pattern
recognition, optimization and decision making.

Unit No. I Introduction 8 hours


Learning Problems, Perspectives and Issues, Concept Learning, Version Spaces and Candidate
Eliminations, Inductive bias, Decision Tree learning, Representation, Algorithm, Heuristic Space
Search.

Unit No. II Supervised Learning 9 hours


Supervised learning: Logistic regression, Perceptron, Generative learning algorithms, Gaussian
discriminant analysis, Support vector machines, Model selection and feature selection, Ensemble
methods: Bagging, boosting, Evaluating and debugging learning algorithms.

Unit No. III Unsupervised Learning 9 hours


Locally weighted Regression, Radial Basis Functions, Case Based Learning, Expectation
Maximization, Mixture of Gaussians, Factor analysis, Principal components analysis (PCA),
Independent components analysis (ICA).

Unit No. IV Computational Learning 9 hours


Concept Learning, Maximum Likelihood, Minimum Description Length Principle, Bayes
Optimal Classifier, Gibbs Algorithm, Naive Bayes Classifier, Bayesian Belief Network,
Probabilistic Learning, Sample Complexity, Finite and Infinite Hypothesis Spaces, Mistake
Bound Model.

Unit No. V Advanced Learning Methods 10 hours


Learning Set of Rules, Sequential Covering Algorithm, Learning Rule Set, First Order Rules, Sets
of First Order Rules, Induction on Inverted Deduction, Inverting Resolution, Analytical
Learning, Perfect Domain Theories, Explanation Base Learning, FOCL Algorithm,
Reinforcement Learning, Q-Learning, Temporal Difference Learning.

Text / Reference Books


1. Ethem Alpaydin, "Introduction to Machine Learning”, MIT Press, Prentice Hall of India,
2005.
2. Tom Mitchell, Machine Learning, McGraw Hill, 3rd Edition, 1997.
3. Christopher M. Bishop, “Pattern Recognition and Machine Learning”, Springer (2006).

Mode of Evaluation: Tests, Assignments and Seminar.


Recommended by the Board of Studies on:

Proceedings of the 29th Academic Council [26.4.2013] 429


Machine Learning Laboratory

S. No. Indicative List of Experiments (in the areas of)


1 Decision Tree learning
2 Implement Posteriori Probability on a laboratory data set that returns positive data set.
3 Implement Naive Bayes Classifier for medical data set with four symptoms
4 Implement Nearest Neighbor for classification of flower family
5 Use K-means and Hierarchical Clustering to Find Natural Patterns in Data
6 Maximum Likelihood Estimation of Gaussian Mixtures Using the Expectation
Maximization Algorithm
7 Using Hidden Markov Models
8 Estimate Hidden Markov Model Parameters from Emissions
9 Train a Boosted or Bagged Decision Tree
10 Train a Support Vector Machine Classifier

Proceedings of the 29th Academic Council [26.4.2013] 430


CSE612 Big Data Analytics L T P C
3 0 2 4

Version No. : 1.00

Objective
To focus beyond the basic level of understanding that is typically offered at Fundamentals of Big
Data Analytics course.
It focuses on concepts, principles, and techniques applicable to any technology environment and
industry. It establishes a baseline that can be enhanced by further formal training and additional
real-world experience in the Big Data Analytics.

Expected Outcome
On completion of this course the student would be able to:

1. Define learning and knowledge analytics


2. Map the developments of technologies and practices that influence learning and knowledge
analytics as well as developments and trends peripheral to the field.
3. Evaluate prominent analytics methods and tools and determine appropriate contexts where
the methods would be most effective.
4. Describe how “big data” and data-driven decision making differ from traditional decision
making and the potential future implications of this transition.
5. Describe and evaluate developing trends in learning and knowledge analytics and develop
models for their potential impact on teaching, learning, and organizational knowledge.

Unit No. I Algorithms for Handling Big Data 10 hours


Random Forest Algorithm, Unstructured Data Analytics, Overkill Algorithm-Randomized
Matrix Algorithms in Parallel and Distributed Environments, Mahout: Probabilistic Hashing for
Efficient Search and Learning on Massive Data Dirichlet process clustering, Latent Dirichlet
Allocation, Singular value decomposition, Parallel Frequent Pattern mining, Complementary
Naive Bayes classifier, Random forest decision tree based classifier.

Unit No. II Techniques for Handling Big Data 10 hours


Topographic Analysis, Large Scale Machine Learning for Query Document Matching in Web
Search-A Geometric Analysis of Subspace Clustering with Outliers Scalable K-Means++,
Distrubed Computing-queues-Tools: Hazelcast Architecture, Cross platform, Google Protocol
Buffer.

Unit No. III Real Time Analytics and Search 9 hours


In-line queries-In-memory data, data on HDFS, HBase or any other structure on Hadoop
clusters. Impala with large scale search engine like SolrCloud. Real-Time Queries in Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop-scalable parallel database technology
available to the Hadoop community.

Unit No. IV Indexing for Text Retrieval 8 hours


Inverted Indexing for Text Retrieval- Web Crawling- Inverted - Inverted Indexing: Baseline
Implementation - Inverted Indexing: Revised Implementation-Inverted Indexing using JAQL-
Index Compression.

Proceedings of the 29th Academic Council [26.4.2013] 431


Unit No. V Analytics for Big Data in Motion 8 hours
Data Stream Warehousing, Infosphere Stream Basics- How stream works-Streams Processing
Language-Stream Tool Kits.
Apache Flume NG - Microsoft StreamInsight as tools for Complex Event Processing (CEP)
applications. Case Studies Big Data in E-Commerce and IT Energy Consumption, Social and
Health Science.

Text / Reference Books


1. Paul C. Zikopoulos, Chris Eaton, Dirk deRoos, Thomas Deutsch, George Lapis,
“Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data,
McGraw-Hill, 2012.
2. Lin and Chris Dyer,”Data-Intensive Text Processing with MapReduce Jimmy”, Morgan &
Claypool Synthesis, 2010.
3. Bill Franks, “Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams
with Advanced Analytics”, John Wiley & Sons, 2012.

Mode of Evaluation : Tests, Assignments and Seminar


Recommended by the Board of Studies on:

Big Data Analytics Laboratory

S. No. Indicative List of Experiments (in the areas of)


1 Implementation of Random Forest Algorithm for Handling Large Data sets
2 Implementation of Overkill Algorithm for Handling Large Data Sets
3 Implementing Topographic Analysis for handling Data Analytics
4 Large Scale Machine Learning for Query Document Matching in Web Search
5 Real Time Analytics and Search using Hadoop and SolrCloud
6 Real Time Analytics and Search using Cloudera Impala
7 Indexing for Text Retrieval
8 Analytics for Big Data in Motion using Energy Consumption data sets
9 Analytics for Big Data in Motion using Social and Health Science data sets

Proceedings of the 29th Academic Council [26.4.2013] 432


CSE613 Data Visualization L T P C
3 0 2 4

Version No.: 1.00

Course Data Mining Techniques


Prerequisites/ :
Exposure

Objectives
To focus on the state-of-the-art in various areas of data visualization, such as data types, chart
types, visual variables, visualization techniques, color theory, and data patterns.

Expected Outcome
On completion of this course the student would be able to:
 Understand the principles of creating and evaluating effective data visualizations.
 To use software tools to create various data visualizations.
 Familiar with the visualization techniques in major application areas.
 Acquire the skill to apply visualization techniques to a problem and associated data set.

Unit 1 Principles and theories of visualization 7 hours


Theories related to visual information processing - Color theory - Data types - Visual variables -
Chart types: statistical graphs, maps, trees and networks Introduction to R - Analyzing and
exploring data with R - Statistics for model building and evaluation. Case Study on Financial data
analysis, Social media analysis and Censes data analysis etc.,

Unit II Aspects of Data Patterns 8 hours


Acquisition of data, Discipline-independent classification of information sources, Data base
issues – In memory database - storage and retrieval of data - Query languages - Reliability of data
– Patterns and predicting data, continuously and discontinuously variable data, plotting data and
suitability for different types of data.

Unit III Visualization techniques 11 hours


Scalar and point techniques - Vector visualization techniques - Multidimensional techniques –
glyphs, Graph-theoretic graphics - Linked Views for Visual Exploration - Multivariate
Visualization by Density Estimation, Volume Visualization – Rendering - Attribute Mapping -
Visualizing Cluster Analysis - Visualizing Contingency Tables - Matrix Visualization -
Visualization in Bayesian Data Analysis - Evaluation of data visualization.

Unit IV Applications 9 hours


Visualization for Genetic Network Reconstruction, Reconstruction, Visualization and Analysis
of Medical Images, Exploratory Graphics of a Financial Dataset, Visualization Tools for
Insurance Risk Processes, Case study: Visualization of Social Networks datasets, Visualizing
Darwin’s database.

Unit V Tools and Languages 10 hours


Programming Statistical Data Visualization in the Java Language, Web-Based Statistical Graphics
using XML Technologies, Google Map API, Google Chart, Tableau - Heat Map Generation

Proceedings of the 29th Academic Council [26.4.2013] 433


Text Books / Reference Books
1. Ben Fry, “Visualizing Data: Exploring and Explaining Data with Processing Environment”,
O'Reilly Media, 2008.
2. C.H. Chen, W.K. Hardle, A. Unwin, “Handbook of Data Visualization”, Springer, Ed(XIV),
2008.
3. Avril Coghlan, “A Little Book of R For Multivariate Analysis”, 2013.
4. Avril Coghlan, “A Little Book of R For Biomedical Statistics”, 2013.
5. Paul Murrell, “R Graphics”, 2nd Edition Chapman and Hall / CRC Press, R Series, 2011.
6. John Verzani, “Simpler – Using R for Introductory statistics”, Taylor and Francis, eSeries,
2005.

Mode of Evaluation : Continuous Assessment Tests, Seminars and Assignment.


Recommended by the Board of Studies on :

Data Visualization Laboratory

S. No. Indicative List of Experiments (in the areas of)


1 Statistics for model building and evaluation
2 Acquisition of data, plotting data
3 Financial and census data analysis using R
4 Patterns and predicting data Using R
5 Volume Visualization and Rendering
6 Visualization and evaluation in Bayesian Data Analysis
7 Visualization for Genetic Network Reconstruction
8 Visualizing Darwin’s database
9 Multivariate analysis of user behavior online using Heat Map.
10 Visualization and Analysis of Medical Images using Java

Proceedings of the 29th Academic Council [26.4.2013] 434


CSE614 Large-Scale Data Management Techniques L T P C
2 0 0 2
Version No. 1.00

Course Database Management System, Advanced Database


Prerequisites/ : Systems
Exposure

Objectives
To focus on Data Management to handle large-scale data arising from the Internet and
Enterprise -based applications.
Expected Outcome
On completion of this course the student would be able to develop solutions for both building
data-intensive scalable applications over the Internet/Web as well as for large-scale data
analytics.

Unit 1 Infrastructure as a Service (IaaS) 6 hours


Introduction to IaaS - Virtualization – Server – Storage – Network – data storage – Local Cloud
and Thin Clients - Load balancing – Improving performance through Load balancing - scalability
– Managing cloud resource – cloud capacity management – Virtual machine provisioning –
Migration service - Case study – Deploying application in IaaS engine.

Unit II Data Management Solution for Internet Applications: 6 hours


Google's Application Stack: Chubby Lock Service, Big Table Data Store, and Google File
System; Yahoo's key-value store: PNUTS; Amazon's key-value store: Dynamo; Correctness
Semantics of key-value store and its impact on application development.

Unit III Enterprise Data Analytics: 7 hours


Infrastructure Requirements: Acquire Bigdata, Organize Bigdata, Analyze Bigdata,OLTP
Databases: Cassandra, Oracle Big Data Appliance: CDH and Cloudera Manager , Oracle Big
Data Connectors, Oracle loader for Hadoop, Oracle direct connector for HDFS,Oracle Data
Integrator Application Adapter for Hadoop, Oracle R Connector for Hadoop, In Database
Analytics Tools: In-Database Data Mining , In-Database Text Mining, In-Database Semantic
Analysis, the Data Cube Model.

Unit IV Data Analytics in the Internet Context: 6 hours


Information governance, Data definition and usage standards, Metadata management, Data
lifecycle management, Risk and cost containment, Programming paradigms: Pig Latin and Hive,
and parallel databases versus Map Reduce.

Unit V Applications: Massive Data Sets 5 hours


Applications: Detecting Fraud in the Real World, Massive Datasets in Astronomy, Data
Management in Environmental Information Systems, Massive Data Sets issues in Earth
Observing, Massive Data Set Issues in Air Pollution Modelling, Mining Biomolecular Data Using
Background Knowledge and Artificial Neural Networks.

Text / Reference Books


1. Gerhard WEIKUM and Gottfried VOSSEN, “Transactional Information Systems: Theory
and the practice of concurrency control and recovery”, Morgan Kaufmann Publishers, June
2001.
2. Lars George, “HBase: The Definitive Guide”, O'Reilly Media, Inc, 1st Edition, 2011.

Proceedings of the 29th Academic Council [26.4.2013] 435


3. Even Hewitt, “Cassandra: The Definitive Guide”, O’Reilly Media, Inc, 2010.
4. Alex Holmes, “Hadoop in Practice”, Manning Publications, [Kindle Edition], 2012.
5. James Abello, Panos M. Pardalos, Mauricio G.C. Resende, “Handbook of Massive Data
Sets”, Kluwer Academic Publishers, 2002.
6. Alan Gates, “Programming Pig Dataflow Scripting with Hadoop”, O'Reilly Media, Inc, 2011.
7. Donald Miner, Adam Shook, “MapReduce Design Patterns Building Effective Algorithms
and Analytics for Hadoop and Other Systems”, O'Reilly Media, Inc, 2012.
8. Oracle for Big Data Enterprise, Oracle, 2012.

Mode of Evaluation: Continuous Assessment Tests, Seminars and Assignments.


Recommended by the Board of Studies on:

Proceedings of the 29th Academic Council [26.4.2013] 436

You might also like