You are on page 1of 15

GLOBAL INSTITUTE OF ENGINEERING AND TECHNOLOGY

MELVISHARAM VELLORE

Department of Information Technology


SEMESTER / YEAR: VII / IV
COURSE: IT6701

ACADEMIC YEAR: 2016 2017


SUBJECT NAME: INFORMATION
MANAGEMENT
Unit I

1. Define database design.


Database design is the process of producing a detailed data model of database. This data
model contains all the needed logical and physical design choices and physical storage parameters
needed to generate a design in a data definition language, which can then be used to create a
database.
2. List the steps involved in database design.
Determine the data to be stored in the database.

Determine the relationships between the different data elements.

Superimpose a logical structure upon the data on the basis of these relationships.

3. Define ER diagram.
An ER (Entity Relationship) diagram is a diagram that helps to design databases in an efficient way.
4. What are the various attributes in ER diagram?
Attributes are the properties of entities. Attributes are represented by means of ellipses. Every ellipse
represents one attribute and is directly connected to its entity (rectangle).
5. What is meant by normalization?
Normalization is a systematic way of ensuring that a database structure is suitable for generalpurpose querying and free of certain undesirable characteristicsinsertion, update, and deletion that
could lead to loss of data integrity
6. What are the various forms present in normalization?
Normalization consists of normal forms that are 1NF, 2NF, 3NF, BOYCE-CODD NF 4NF and 5NF.
7. Define data modeling.
Data modeling in software engineering is the process of creating a data model for a system by
applying formal data modeling techniques.
8. List out the business rules.
Explicit expression
Coherent representation
Evolutionary extension
Declarative nature
9. What is meant by JDBC?

Java Database Connectivity (JDBC) is an application programming interface (API) for the
programming language Java, which defines how a client may access a database. It is part of the Java
Standard Edition platform, from Oracle Corporation.
10. What is Flume?
Flume is a distributed, reliable, available service for efficiently moving large amounts of data as it is
produced.
11. Define big data.
Big Data is data whose scale, diversity, and complexity require new architecture, techniques,
algorithms, and analytics to manage it and extract value and hidden knowledge from it
12. What are the advantages of big data?
Scale (Volume):

Data Volume

44x increase from 2009 2020

From 0.8 zettabytes to 35zb


Complexity (Varity):

Various formats, types, and structures


Text, numerical, images, audio, video, sequences, time series, social media data, multi-dim
arrays
Speed (Velocity):

Data is begin generated fast and need to be processed fast

Online Data Analytics


13. List the benefits of SQL.
Store persistent data
Application Integration
Mostly Standard
Reporting
Concurrency Control
14. What is meant my NoSQL?
NoSQL, which encompasses a wide range of technologies and architectures, seeks to solve the
scalability and big data performance issues that relational databases werent designed to address.
NoSQL is especially useful when an enterprise needs to access and analyze massive amounts of
unstructured data or data that's stored remotely on multiple virtual servers in the cloud.
15. What are the functions of HIVE?
Hive has three main functions: data summarization, query and analysis. It supports queries expressed
in a language called HiveQL, which automatically translates SQL-like queries into MapReduce jobs
executed on Hadoop. In addition, HiveQL supports custom MapReduce scripts to be plugged into
queries.
16. Compare the old and new approaches of big data analytics.
Old way
a. A data and analytics technology
b. Stack with different layers cross-communicating data
c. Work on scale-up expensive hardware.

New way
d. a data and analytics platform
e. does all the data processing and analytics in one layer, without moving data back and forth
f. scalable (scale out) commodity hardware
17. Define Hbase.
HBase is an open-source, distributed, column-oriented database built on top of HDFS based on Big
Table. A distributed data store that can scale horizontally to 1,000s of commodity servers and
petabytes of indexed storage.
18. Give the purpose of Hbase.
Hbase is designed to operate on top of the Hadoop distributed file system (HDFS) for scalability,
fault tolerance, and high availability.
19. Define thrift client.
The Hive Thrift Client makes it easy to run Hive commands from a wide range of programming
languages. Thrift bindings for Hive are available for C++, Java, PHP, and Python.
20. What is MapReduce?
MapReduce is a processing technique and a program model for distributed computing based on java.
The MapReduce algorithm contains two important tasks, namely Map and Reduce.
Map takes a set of data and converts it into another set of data, where individual elements are broken
down into tuples (key/value pairs).
Reduce takes the output from a map as an input and combines those data tuples into a smaller set of
tuples.

Part B
1.
2.
3.
4.
5.
6.
7.

Explain the database design and modeling.


Describe the various business rules and relationships.
Explain the java database connectivity.
Write a note on big data.
Describe the various elements of Hadoop.
Draw and explain the HIVE architecture.
Explain the HDFS architecture with neat sketch.

Unit II
DATA SECURITY AND PRIVACY
1. Define data security.
Data security means protecting data, such as a database, from destructive forces and from the
unwanted actions of unauthorized users..
Data security is the practice of keeping data protected from corruption and unauthorized
access. The focus behind data security is to ensure privacy while protecting personal or
corporate data.
2.

Define data privacy.


Data privacy, also called information privacy, is the aspect of information technology (IT) that deals
with the ability an organization or individual has to determine what data in a computer system can be
shared with third parties.

3. What is flaw?
A flaw is a problem with a program. A security flaw is a problem that affects security in some way
Confidentiality, integrity, availability. Flaws come in two types: faults and failures
4. What is meant by fault?
When a human makes a mistake, called an error, in performing some software activity, the error may
lead to a fault, or an incorrect step, command, process, or data definition in a computer program.
A fault is a mistake behind the scenes
o An error in the code, data, specification, process, etc.
o A fault is a potential problem
5. What is meant by failure?
A failure is when something actually goes wrong, means deviation from desired behaviour, (not
necessarily from specified behaviour).
6. Mention the types of flaws in program security.

Validation error (incomplete or inconsistent)

Domain error

Serialization and aliasing

Inadequate identification and authentication

Boundary condition violation

Other exploitable logic errors

7. Describe the term malicious code.


Malicious Code is the term used to describe any code in any part of a software system or script that
is intended to cause undesired effects, security breaches or damage to a system. Malicious Code is an

application security threat that cannot be efficiently controlled by conventional antivirus software
alone.
8. Define virus.
A virus is a program or programming code that replicates by being copied or initiating its
copying to another program, computer boot sector or document.
A virus is a piece of code that inserts itself into a host [program], including operating
systems, to propagate. It cannot run independently. It requires that its host program be run to
activate it.
9. What is worm?
A computer worm is a standalone malware computer program that replicates itself in order to
spread to other computers. Often, it uses a computer network to spread itself, relying on
security failures on the target computer to access it. Unlike a computer virus, it does not need
to attach itself to an existing program.
A worm is a program that can run independently, will consume the resources of its host
[machine] from within in order to maintain itself and can propagate a complete working
version of itself on to other machines.
10. Describe the term firewall.
A firewall is a network security system that monitors and controls the incoming and outgoing
network traffic based on predetermined security rules.
11. Who is intruder?
An Intruder is a person who attempts to gain unauthorized access to a system, to damage that
system, or to disturb data on that system. This person attempts to violate Security by interfering with
system Availability, data Integrity or data Confidentiality.
12. What is intrusion detection system?
An intrusion detection system (IDS) is a device or software application that monitors network
or system activities for malicious activities or policy violations and produces electronic reports to a
management station.
13. Give some data privacy principles.
Personal data shall be obtained only for one or more specified and lawful purposes, and shall
not be further processed in any manner incompatible with that purpose or those purposes.
Personal data shall be adequate, relevant and not excessive in relation to the purpose or
purposes for which they are processed.
14. List out few data privacy security laws.
Electronic Communications Privacy Act (ECPA);
Fair Credit Reporting Act (FCRA);
Fair and Accurate Credit Transaction Act (FACTA);
Children Online Privacy Protection Act (COPPA);
15. Give the limitations of IDS

Noise can severely limit an intrusion detection system's effectiveness. Bad packets generated from
software bugs, corrupt DNS data, and local packets that escaped can create a significantly high falsealarm rate.
16. What are the functions of IDS?
The functions preferred by IDS are listed below
Monitors users and system activities.
Scrutinizes the system configuration assessing its mis-configurations and vulnerabilities.
Checks the integrity of the critical system and data files.
Discovers attack patterns in a system activity.
Corrects system configuration errors.
17. What are the various ways of virus attachments?
The various ways of virus attachment are as follows.
Appended virus
Virus that surrounds a program.
Integrated virus.
18. What is compliance?
Compliance is merely a snapshot of how your security program meets a specification set of security
requirements at a given moment in time.
19. List out various types of firewalls.
Packet filtering gateway
Stateful inspection firewall
Application proxies
Guards
20. What is a non-malicious program error?
Non-malicious program errors are mostly due to human mistakes that go unnoticed while coding and
do not cause severe damage to the system.
A few types of non-malicious program errors are,
Buffer overflows
Incomplete mediation
Time-of-check to time-of-use errors

Part B
1.
2.
3.
4.
5.
6.

Write a brief note on program security.


Explain various types of data flaws.
Define threat. Discus various types of data threats.
How firewall protects the data. Explain in detail.
Illustrate the Network Security Intrusion detection systems
Describe the principles and laws of data protection in detail.

Unit III
INFORMATION GOVERNANCE
1. What is Master Data Management (MDM)?
Master data management (MDM) is a comprehensive method of enabling an enterprise to link all of
its critical data to one file, called a master file that provides a common point of reference. When
properly done, MDM streamlines data sharing among personnel and departments.
2. Define data consolidation.
Data consolidation is the process of capturing master data from multiple sources and integrating into
a single hub (operational data store) for replication to other destination systems.
3. What is data propagation?
Data propagation is the process of copying master data from one system to another, typically
through point-to-point interfaces in legacy systems.
4. Why MDM is needed?

Regulatory compliance

Privacy and data protection

Safety and security

Meaningful data mining

You cant control what you dont know

You cant measure what you dont control

5. What are the two modes of privacy?


Two modes of privacy are
Location privacy.
Users want to hide their location information and their query information.
Query privacy.
Users do not mind to or obligate to reveal their locations. However users want to hide their
queries.
6. Define data governance.

Data governance (DG) refers to the overall management of the availability, usability, integrity, and
security of the data employed in an enterprise. A sound data governance program includes a
governing body or council, a defined set of procedures, and a plan to execute those procedures.
7. Differentiate location privacy and database privacy.
Database privacy
Location privacy
The goal is to keep the privacy of the stored data. Eg The goal is to keep the privacy of data that is not
medical data
stored. Eg received location data
Queries are explicit. Eg SQL queries for patient
Queries need to be private. Eg location based queries
record
Application for the current snapshot of data
Should tolerate the high frequency of location
updates.
Privacy requirements are set for the whole set of
Privacy requirements are personalized
data
8. List out the few goals of data governance.
Increasing consistency and confidence in decision making
Decreasing the risk of regulatory fines
Improving data security, also defining and verifying the requirements for data distribution
policies
Maximizing the income generation potential of data
Designating accountability for information quality
Enable better planning by supervisory staff
Minimizing or eliminating re-work
Optimize staff effectiveness
Establish process performance baselines to enable improvement efforts
Acknowledge and hold all gain
9. Explain in short different data quality management tools.
The different data quality management tools are as follows.
Data cleansing tool
Data parsing tools
Data profiling tools
Data matching tools
Data standardization
Data extract, transform and load( ETL) tools
10. Give the different stages of MDM implementation.
Identify sources of master data
Identify the procedures and consumers of the master data
Collect and analyze metadata about your master data
Appoint data stewards
Implement a data governance program and data governance council.
Develop the master data model.

11. Differentiate master data management and data warehouse


Master data management
Data warehouse
MDM ensures data consistency at the source level.
Data warehouse ensures a consistent view of data at
the warehouse level.
MDM is applied only on entities and not on
Data warehouse is applied on traditional and non
transactional data and it affects only dimensional
transactional data and it is affects both dimensional
tables
tables and fact tables.
MDM works on current data.
Data warehouse works on historical data.
In MDM the original data source gets affected to
Data warehouse is accessible directly without
maintain a signal version of accurate data.
affecting the original data sources.
12. What are the various layers of MDM?
Layer 1: Service Abstraction Layer
Layer 2: Data Quality Layer
Layer 3: Data Rule Layer
Layer 4: Data Management Layer
Layer 5: Business Process Layer
13. List out the characteristics of integrated risk management (IRM).
Ability to mitigate financial and transactional risks associated with data.
Ability to provide consistent, accurate and verifiable information to internal and external
users of the business.
Ability to satisfy compliance requirements using secure and consistent data.
Ability to define and implement enterprise wide data governance and quality metrics.
14. Write the advantages of Sarbanes Oxley Act.
Reduction of financial statement fraud.
Strengthening corporate governance
Reliability of financial information.
Improving the liquidity.
Model for private and nonprofit compliance.
15. Describe the Office of Comptroller of Currency 2001- 47
The Office of the Comptroller of Currency (OCC) has defined rules for financial institutions
that plan to share their sensitive data unaffiliated vendors.
The OCC expects the board of directors and management of the organization to properly
oversee and manage third-party relationships.
Part A
1.
2.
3.
4.

Compare and contrast master data management with data warehousing.


Explain data governance and data synchronization with respect to master data management.
Explain OCC, FFIEC and SBI acts in detail.
What is the significance of risk management in MDM? Also explain key characterizes of integrated

risk management (IRM)


5. Explain use pattern dimensions in brief.

Unit IV
INFORMATION ARCHITECTURE
1. What are the various principles of information architect?
The individual who organizes the patterns inherent in data, making the complex clear.
A person who creates the structure or map of information which allows others to find their
personal paths to knowledge.
The emerging 21st century professional occupation addressing the needs of the age focused
upon clarity, human understanding and the science of the organization of information.
2.

List out the Jobs of information architect.


Clarifies the mission and vision for the site, balancing the needs of its sponsoring
organization and the needs of its audiences.
Determines what content and functionality the site will contain.
Specifies how users will find information in the site by defining its organization, navigation,
labeling, and searching systems.

3. Define information architecture


Information architecture is composed o\f design rules that make information on the web easily
understandable and findable. The main function of information architecture is to place or organize
the information such that the users can easily find and understand them.
4. What is customers perspective?
Consumers, or users as we more commonly refer to them, want to find information quickly and
easily. Contrary to what you might conclude from observing the architectures of many large,
corporate web sites, users do not like to get lost in chaotic hyper textual webs.
5. What is Producer's Perspective?

Organizations are completely altruistic, they usually want to know the return on their investment for
information architecture design.. Buying information architecture services is not like investing in a
mutual fund. You can't calculate hard and fast numbers to show the exact benefit of your investment
over time.
6. Define heterogeneity.
Heterogeneity refers to an object or collection of objects composed of unrelated or unlike parts.
7. Explain the dimensions of information ecology.
Information ecology has three dimensions, namely content, context and users to address to complex
dependencies that exist in the system.

8. What are the different phases in information architecture development?


The development of information architecture has five basic phases namely
Analysis or research.
Strategic planning.
Design
Implementation.
Administration.
9. What is meant by ambiguous or subjective organization schemes?
Ambiguous or subjective organization schemes are used for categorizing information such that it
may be specified of defined by an organization field. They are difficult to design, but are mor useful
than exact organization schemes.
10. List out the types of labels.
Label as contextual links.
Label as heading.
Labels within navigation systems.
Iconic labels.
Labels as index terms.
11. What is personalization and customizations?
Personalization
Personalization is related to contents designed for individuals. It is used for representing
pages based on behavior, needs or preferences of the individual user.
Customization
Customization is related to customizing contents for a group of users. It allows users to take
control over some combination of navigation and presentation.
12. What is visualization and social navigation?
Visualization provides various tools that enable users to navigate using more visual options.
Social navigation enables navigation of social networking sites and allows an organization to
observe the actions of other users.
13. What is use of labeling system?

The labeling system is used for representing thoughts and concepts on a website. It is associated with
chunks of information linked to a label on the web site.
14. What a navigation system does?
The navigation system involves components or group of components on a website that enable access
to web pages within a site. The navigation on a website allows users to migrate from one page to
another.
15. What does the navigation system tool will do?
The navigation system tools provide context and flexibility to the users that help them to understand
where they are and where they can go. The navigation system can be designed to support
associatively by providing resources related to context that are currently being displayed.
16. Draw the components of information architecture.
Labeling system

Organization systems Information architecture Labeling system

Searching system

Part B
1. Define information architecture and explain its different components in detail.
2. What is the significance of organization and navigation systems in information architecture?
3. Explain the responsibilities of an information architect, graphics designers, web designer, and
programmer with respect to information architecture.
4. Explain in detail the classification of an organization system.
5. Explain the different types of labels.
6. Explain navigation systems briefly.
7. Explain the different organization structures.
8. Explain different organization schemes used in an organization system.
9. Explain the different phases of information architecture development.
10. Explain the dimensions of information ecology.

Unit V

INFORMATION LIFECYCLE MANAGEMENT


1. Define data retention.
Data retention defines the policies of persistent data and records management for meeting legal and
business data archival requirements
2. Why do enterprises have to retain information for long?
The enterprises retain or discover information for a number of reasons. The discovery could be
coming from three angles.
The first is for the business reason.
The second is the compliance or legal aspect.
The third requirement is with respect to storing personal information.
3. How is testing in big data different from the traditional database testing?
The traditional database applications deal with limited amount of data, which is in a structured
format.
Big data applications deal with huge amount of data, which can be in structured. Semi-structured or
un-structured.
4. What is the purpose of data retention policy?
There are essentially three main objectives in developing a data retention policy. They are
To maintain important records and documents for further use or reference.
To dispose of records or documents that are no longer needed.
To organize records so that they can be searched and accessed easily at a later data.
5. What does an internet service provider (ISP) must maintain?
According to ISP license, each ISP must maintain
Customers and services.
Outward logins/connections or telnet.
Data packets.
Subscribers.
Internet leased line customers.
Network records and purpose.
Commercial records.
Remote activities
6. What is confidential information?
Confidential information is used in a general sense to mean sensitive information whose access is
subject to restriction, and may refer to information about an individual as well as that which pertains
to a business.

7. What are the types of sensitive informations?


There are three main types of sensitive information

Personal information.
Business information.
Classified information

8. Describe classified information.


Classified information pertains to a government body and is restricted according to the level of
sensitivity. Information is generally classified to protect security. Once the risk of harm has passed or
decreased, classified information may be declassified and
9. How security is classified.
Data classifications are defined by data owners with two exceptions SSNs and credit card data that
are explicitly defined and protected by policy.

Public

Internal

Sensitive

10. Explain the information technology act.


The Information Technology Act was amended to introduce the following.
A new civil provision prescribing damages for an entity that is negligent in using reasonable
security practices and producer while handling sensitive personal data or information resulting in
wrongful loss or wrongful gain to any person.
11. Give the salient features of the new IT act rules.
Sensitive personal information.
Privacy policy
Consent for collection
Notification
Use and retention
Right of access, correction and withdrawal
Transnational transfer
Security procedures.
12. Define lifecycle management.
Data lifecycle management is the process of handling the flow of business information throughout its
lifespan, from requirements through maintenance.
13. What are the various stages of life cycle management?
Data creation.
Backup storage against data loss.
Archiving helps contain storage costs.
Ensuring secure data destruction.

Put secure IT asset disposition to work.

14. What does data administration does?


Data administration is the method by which data is monitored, maintained and managed by a person
and / or an organization. Data administration allows an organization to check its data resources,
along with their processing and communications with different applications and business process.
15. What are the various challenges in big data testing?
Testing big data application have several challenges, they are
Automation
Virtualization
Large dataset
Testing across platform
Monitoring and diagnostic solution
16. How can we ensure the data availability?
The administrator should ensure that the data is made available to its user in such a way that the
users are unaware of the failure. The administrator also ensures that the data remains in a consistent
state and appropriate techniques to achieve these are implemented.
17. Define data tuning.
Data needs to be evolved with time as the users needs changes. The administrator should modify the
structure or design of the database to incorporate
18. How data administrator is differ from database administrator.
In terms of fundability data administrator deals with designing of the logical and conceptual models
treating the data at an organizational level. Whereas, database administrators deal with the
implementation of database required and in use.
19. List out the objectives of an organization for managing a data.
Data veracity is critical for both analytics and regulatory compliance.
Both structured and unstructured data must be managed effectively.
Data privacy and security must be protected at all times.
20. What is Unified Access Service License ( UASL)
USAL was introduced by DoT through which an access service provider can offer a fixed and / or
mobile services using any technology under the same license.
Part B
1.
2.
3.
4.
5.

Give a sample data retention policy for a telecom service provider.


Explain different types of sensitive information.
Explain the data lifecycle management process.
Explain the need for data archiving.
What are the challenges faced by a data administrator?

You might also like