You are on page 1of 13

Lecture 1:

Goals:
1. Explain potential advantages and use cases of BI&A (Business Intelligence and Analytics)
a. Need ability to make clear cut decision
b. BI&A is about reporting for the masses, advanced BI frontends, real-time access time,
planning capabilities, closed loop performance management
2. Describe the basic concepts of decisional making and decisional guidance
a. Decision making is the process of sufficiently reducing uncertainty and doubt about
alternatives to allow a reasonable choice to be made from among them Four steps:
i. Intelligence: search for conditions that call for decisions
ii. Design: invent, develop, and analyze possible alternative solutions
iii. Choice: select one of the solutions
iv. Implementation: adapt the selected course of action to the decision situation
3. Characterize the purpose of decision support systems and business intelligence systems
a. Decision Support System is a computer-based information system that supports DM
activities Pro: Speedy computations, improved communication/collaboration, improved
data management, quality and agility support, overcoming cognitive limits
b. Business Intelligence and Analytics refers to techniques, technologies, systems, etc.
that analyze critical business data to help an enterprise better understand its business
and market and make timely business decisions
4. List the most important components of BI&A systems
a. Slide 42?
b. Data Warehouse Environment (Technical Staff)
c. Business Analytics Environment (Business users) with user interface
d. Performance and Strategy (Managers/executives) with user interface
e. Features provided: reporting, dashboard, office integration, search-based BI, mobile BI,
OLAP, interactive visualization, scorecards, metadata management, collaboration, etc..
5. Characterize the BI software market and list major players in it
a. Complete BI platforms:
i. IBM, SAP, SAS, Oracle, MS, MicroStrategy
b. Focus on selected BI strategies:
i. InfoZoom, tableau, QlikView, MIS, .
c. Landscape continuously increases, but only few visionaries
Lecture 2:
1. Understand the basic definitions and concepts of data warehousing
a. A data warehouse is a pool of data produced to support decision making Data are
usually structured to be available in a form ready for analytical processing.
b. Characteristics: Subject oriented, integrated, time variant, nonvolatile,
relational/multidimensional, client/server, include metadata

c.
d. Direct Benefits:
i. allows end users to perform extensive analysis
ii. consolidated view of corporate data
iii. better and more timely information access
iv. enhanced system performance
e. Indirect benefits:
i. Enhance business knowledge
ii. Create competitive advantage
iii. Enhance customer satisfaction
iv. Facilitate DM
v. Optimizing business processes
2. Understand data warehousing architectures

a. Elements:
i. Data warehouse with data and associated software
ii. Data acquisition (back-end that extracts data from ERP systems and external
sources and summarizes them
iii. Client (front-end) software that allows access and analyzes data
b. Concepts:
i. Data Sources: contains data to be loaded into DW (ERP, Web Services)
ii. Enterprise DW is a centralized repository for the entire enterprise
iii. Data Mart is a departmental DW that stores only relevant data
c. Factors that can affect the architecture:
i. Information interdependence between units, management needs, urgency of need
for DW, constraints on resources, compatibility, technical issues, social factors
d. Architecture selection: Kimball vs Inmon
i. Kimball views data warehousing as a constituency of data marts. Data marts are
focused on delivering business objectives for departments in the organization. And
the data warehouse is a conformed dimension of the data marts. Hence a unified
view of the enterprise can be obtained from the dimension modeling on a local
departmental level.
ii. Inmon beliefs in creating a data warehouse on a subject-by-subject area basis.
Hence the development of the data warehouse can start with data from the online
store. Other subject areas can be added to the data warehouse as their needs
arise. Point-of-sale (POS) data can be added later if management decides it is
necessary.
iii.

Overall
approach
Architectur
e structure

Data Mart Approach


Kimball
Bottom-Up

Complexity
Developm
ent
methodolo
gy

Data mart is subject oriented


(e.g., for single business
processes) or departmentoriented(e.g., only for Sales)
Build one data mart at a
time the DW is developed
sequentially
DW = collection of data
marts

High
Iterative

EDW Approach
Inmon
Top-Down

One central EDW provides the


consistent and comprehensive
view of the enterprise
Data marts are optional
supplements for specific
departments or subjects
Data marts are based on the
EDW. That means, they get
their data from the EDW.

Low
Step-wise

3. Explain the multidimensional model


a. Fours perspectives: Bottom-up vs. Top-down, multi-dim-data-model, relational data
models, meta data
b. A Multidimensional model is focused on data analysis, based on the following
elements:
i. Measurable business facts (revenues)
ii. Business facts can be viewed and analyzed along different dimensions (time,
region)
iii. Facts and dimensions make up data cubes

iv. Star Schema vs. Snowflake Schema

4. Explain the extraction, transformation, and load (ETL) processes Data provision process
a. Extraction (reading data from database)
i. Synchronous access / asynchronous access
ii. File based extraction / stream based extraction
iii. Full extraction / delta extraction
iv. Usage of filters / no filters
v. Standard extractors / custom extractors
b. Transformation (conversion of data into needed form)
i. Filtering (e.g. filter all deleted orders)
ii. Harmonization (e.g. resolve master data inconsistencies)
iii. Enrichment (e.g. calculate new facts from existing ones)
iv. Aggregation (e.g. by minimizing a dimension)
c. Load (place data in DW)
i. During this phase data is updated
ii. Alternatives: Full vs Delta load, daily/weekly/
iii. Has to be customized to the chosen model
iv. Data quality mechanisms are often implemented to be triggered during the load.
d. Usually a triggered (automated) process, logs and monitoring helps to find errors
automation is very important and can require large portions of the project effort
5. Metadata Management
a. Metadata is information about data (tables, columns)
b. Master data is the opposite of transactional data, one entry per legal entity
c. Transactional data is the opposite of master data, one entry per transfer

d. Meta vs. Master


e. Types of metadata:
i. Business: explain what things mean (glossary with terms and definitions

ii. Technical: technical description of data assets (tables, data stores, physical
attributes)
iii. Operational: monitor job execution (statistics about processes)
f. Benefits:
i. Build common understanding of data
ii. Facilitate the quest for data quality
iii. Support discovery and reuse of data
iv. Analyze dependencies
v. Facilitate future changes
vi. Monitor Usage
Lecture 3,4:
1. Understand the concept of Big Data and some typical applications
a. BD is about leveraging the extended capabilities to analyze information and emphasize
diverse data sources and formats, volume only secondary role atm.
b. Four dimensions of BD that have to be differentiated:
i. Volume: Data at large scale
ii. Variety: Data in many forms (structured, unstructured, text)
iii. Velocity: Data in motion (analysis of streaming data to enable decisions in
fractions of a second)
iv. Veracity: Data uncertainty (managing the reliability and predictability of inherently
imprecise data types
2. Explain the technological changes which foster Big Data Analysis
a. Increasing data storage capacity from analog to digital
b. Increasing computation capacity
c. Increasing data collection
d. New, emerging data sources such as social networks
e. Digitization and connection of traditionally physical devices
f. Technological changes and changes in the use of existing technology processing 32
64bit, multicore, disk> in-memory storage, data organization: row column,
vectors dictionary encoding very important because data is only stored
once and not every time a new item is created
g. Slide 25 as example?
3. Differentiate Big Data technologies from traditional BI approaches
a. Traditional vs. Big Data approach (structured, analytical, logical vs. creative, holistic
thought, intuition)
Business Users determine what question to
ask
IT structures the data to answer that question
Traditional Computing
Historical fact finding, find stored information,
query data results
Problems of traditional DW
Data might be outdated before users are able
to analyze it, Data rates and volumes are too
big for storing and subsequent analysis

Business users explores what question could


be asked
IT delivers a platform to enable creative
discovery
Stream Computing
Current fact finding, analyze data in motion
before it is stored: data query results
Goals of stream computing
Deliver timely insights, focus attention to
important data, monitor events from variety of
data sources

Slide 53?
4. Depict the implications of this technology and its application
a. Basics:
i. Same data can be represented using different data models pick the one that
supports the app

ii. Data model defines a way of representing data in a db and how an app accesses
the data (relational data model is most common)
iii. Two data model design: denormalized (embedded data models) and normalized
Denormalized data model
(allows storage of related pieces of data
information in the same database record)

Normalized data model


Normalized data models describe relationships
using references between data pieces
minimum of redun...
+ Request and retrieve related data in a single + More flexibility than denormalized data
data base operation
models
+ Better performance for read operations
+ No redundant data single truth
+ Update related data in a single write
+ In total, usually requires less storage
operation
capacity than denormalized data models
- Database records may grow after creation
- Accessing related data requires joining
- database record growth can impact write
multiple pieces of data
performance
- performance when accessing related data
- threat of data fragmentation
if denomalized data models provide only
Use when you have 1:1 (contains)
little read performance advantage, if modeling
relationship or 1:n
M:N relationships and large hierarchical data
sets
b. SQL (structured query language) relational data and/or normalized data models
i. Standardized. Powerful, high level programming language for querying databases
supported by almost all relational databases
ii. Pro: standardized, many operations, many people are able to write SQL, High level
code, ACID
iii. Con: data structure needs to be defined upfront, Always have to be translated to
low level code, rather slow execution
c. Application of relational data models:
i. Transactions cashier
ii. Analysis of transactional data creation of balance sheets, trad. DW
d. NoSQL non-relational data models and/or denormalized data models
i. Key value store: Data is mapping from keys to arbitrary values, values do not
conform to any particular structure very simple to program and implement, can
be easily distributed across multiple machines
ii. Document-oriented store still simple to program and implement, easily
distributed, more structured than key-value stores
iii. Graph-oriented store: flexible extension of data model
Lecture 5:
1. Describe different report types and target groups of BI consumption
a. Executive Management: Performance Management, Dashboards, Scorecards, KPIs
b. Business Analysts: Ad-Hoc Queries, On-line Analytical Processing (OLAP)
c. Front Line Employees: Operational Standard Reports

d.
Business reporting

Analytical reporting

2. Describe Business Reporting and its importance to organizations


a. Definition: Business Reporting describes all types of BI consumption covering
efficient (visual) communication with limited interactivity and limited analytic
capabilities.
b. Table Reports: simplest form of BI reports (one-dimensional list, two-dimensional
matrix report)
i. Matrix report, periodic report, special report, exceptions
3. Understand basic considerations of data visualization
a. Three steps to visualize data:
i. Choose Visual representation (mapping of available information to a visual
format): data objects, their attributes, and the relationships among data objects
are translated into graphical elements such as points, lines, shapes, and colors.
(Dimensions: graph, data, interaction and distortion techniques) Interaction
helps to encourage exploration and allows more dynamic analysis (filtering,
zooming).
ii. Arrangement (placement of available visual elements within a display) of visual
elements: can make large difference in how easy it is to understand the data.
iii. Selection and (De-) Emphasis of interesting data: elimination or de-emphasis of
uninteresting information and/or emphasis of interesting information
4. Select distinct graph types for specific visualization goals
a. Bar graph: good to compare values with each other
b. Stacked graph: good to display multiple instance of a whole and its parts focus on
whole
c. Grouped bar graph: good to display multiple instance of a whole and its parts focus on
parts
d. Line graph: good to reveal shape of data, changes over time
e. Spark Lines: very space efficient representation to display changes of multiple data sets
in dashboards
f. Area graph:
g. Pie: good to display a whole and its parts but has flaws
h. Tree maps: space-constrained visualization of hierarchical structures easy to navigate
into sub-trees
i. Radar graph bar graph better
j. Gauges and meters are problematic because of amount of spaces needed and color
coding
k. Box plots: displays the distribution of data, visualization of core statistical parameters
l. Scatter plots: correlation structures can be recognized easily
m. Remarks: Less Is more no 3D!!!
5. Explain best-practices of dashboard design
a. Definition: A dashboard is a visual display of the most important information
needed to achieve one or more objectives, consolidated and arranged on a
single screen so the information can be monitored at a glance. dashboards
tell us whats happening
b. Pros: high impact visualization of key metrics, easy to use and find information, intuitive,
monitor and manage metrics, actionable analyses, drillable metrics
c. Dashboard design: arrangement, selection, visual layout of utmost importance value
of color code, less is more, compact visualization, homogeneous usage of graph types

d. Gestalt Principles (unified whole): attempts to describe how people preconsciously


organize visual elements into groups or unified wholes when certain principles are
applied proximity, similarity, closure, enclosure, connection, continuity
e. 13 common mistakes?
Lecture 6, 7:
1. Understand the importance of OLAP reports and describe the basic concepts of online
analytical processing (OLAP) and how it can improve decision making
a. OLAP = Technologies and tools that support (ad-hoc) analysis of multi-dimensionally
aggregated data data aggregation, fast data analysis along multiple dimensions, preprocess data from multiple sources
i. MOLAP: data resides in a multidimensional DMBS, multidim. Engine provides
access
ii. ROLAP: data resides in a relational DBMS, OLAP server provides SQL queries
iii. HOLAP: detailed data resides in a relational DBMS, aggregated data resides in a
multidim. DBMS
b. +MOLAP: fast query performance (optimized storage and multidim. Indexing),
automated computation of higher level aggregates, very compact for low dimension data
sets
- MOLAP: processing step can take time, traditionally difficult to query models with high
cardinality dimensions, often creates redundancy
c. Transaction processing and data analysis are dominant approaches for IT infrastructure
but ad-hoc and exploratory reports are gaining importance unclear if OLAP is replaced
d. Codd`s 12 rules for OLAP:
i. Multidimensional conceptual view: Structure data along dimensions
ii. Transparency: (Unformatted) source of data is not visible to end user
iii. 3.Accessibility: Tool (not user) provides data sourcing
iv. 4.Consistent reporting performance: No significant performance impacts by
increasing dimension numbers
v. 5.Client/server architecture: Various clients can be attached to one server
vi. 6.Generic dimensionality: Equivalent operational capabilities for all dimensions
vii. Dynamic sparse matrix handling: Optimal handling of a sparse matrix
viii. 8. Multi-user support: No concurrency restrictions
ix. 9. Unrestricted cross-dimensional operations: Calculation / Manipulation across
unlimited dimensions
x. 10. Intuitive data manipulation: No need for menus but direct interaction with the
data
xi. 11. Flexible reporting: e.g. flexible visualization
xii. 12. Unlimited dimensions and aggregation levels
e. Typical OLAP operations:
i. Roll up (drill-up) summarize data by climbing up hierarchy or by dimension
reduction
ii. Drill down (roll down): reverse of roll-up from higher level summary to lower level
summary or detailed data
iii. Slice and dice filter using one or more dimension
iv. Pivot (rotate) reorient the cube, visualization, 3D to series of 2D planes.
v. Other operations drill across: involving (across) more than one fact table, drill
through: through the bottom level of the cube to its back-end relational tables
(using SQL)
2. Describe the characteristics of advanced analytics techniques and how they may generate new
knowledge
a. Motivation: amount of data constantly growing OLAP etc. not sufficient anymore
methods and tools that automatically generate knowledge from large data sets and
documents are needed to tackle challenges like finding anomalies, forecasting, key
influencers, relationships and trend

b. Definition: Knowledge Discovery in Databases (KDD) is the non-trivial process of


identifying valid, novel, potentially useful, and ultimately understandable
patterns in data.
i. Hypothesis approach: User makes a proposition and seeks to validate it
ii. Discovery approach: Finds patterns, associations, relationships that were
previously unknown to the organization
iii. Supervised learning: predict data with unknown target attribute value with
minimal error search for dependencies pf a target attribute on the input data
iv. Unsupervised learning: Create a pattern of a more compact description of the data
no reference to target attribute, error not measurable
v. Definition: Data Mining is a process that uses statistical, mathematical,
artificial intelligence, and machine learning techniques to extract and
identify useful information and subsequent knowledge from databases.
Data mining is used for finding mathematical patterns from usually large
sets of data. These patterns can be rules, affinities, correlations, trends,
or prediction models
vi. Association Rules: describe correlations between attributes appearing together in
transactions (learn more?? S.40)
vii. Decision Trees: are a set of logical rules the path that leads from the leaf to a
specific class represent a set of boundary conditions easy to read
c. Types of Analytics The Practitioners View
i. Descriptive Analytics: Summarize what happened trad. Analytics and OLAP
ii. Predictive Analytics: Make predictions about the future variety of techniques
the process of discovering meaningful new correlations by sifting through large
amounts of data using pattern recognition, statistical and mathematical
techniques

iii. Prescriptive Analytics: Recommend one or more causes of action and show mostlikely outcome of the action actionable data and feedback required in order to
learn continuously
d. Definition: Text mining is the application of data mining to non-structured or
less structured text files. It entails the generation of meaningful numerical
indices from the unstructured text and then processing these indices using
various data mining algorithms large databases.
i. Example: spam recognition, help desks, analysis of related scientific publications
e. Definition: Web mining is the discovery and analysis of interesting and useful
information from the Web, about the Web, and usually through Web-based
tools.

i. Example: click-stream analysis, product recommendations, intelligent browsing


assistance
f. Web Usage Mining: Discovery of meaningful patterns from data generated by Web client
e.g. user profiles/ratings, meta-data, shopping cart changes, product click-throughs
3. Explain how recent BI&A technologies may contribute to Business Performance Management
and Process Management
a. Definition: Business Performance Management enables an organization to
effectively monitor, control and manage the implementation of strategic
initiatives
b. Business Planning is creating alternatives for action and deciding on the most promising
path mostly done with office tools (spreadsheets)
i. + small businesses, extremely individual requirements, short-term need
ii. Process control, access protection, performance, complexity, errors,
consolidation, etc.
iii. Spreadsheets good for individuals with need for high flexibility
iv. ERP systems are good for very detailed planning
v. Special Planning software allows rapid deployment of pre-configured planning
models
vi. OLAP is the first choice for company-wide, centrally-modeled planning
c. A balanced scorecard is a comprehensive set of performance measures defined
from four different measurement perspectives (financial, customer, internal,
and learning and growth) that provides a framework for translating the
business strategy into operational terms

d. Business Intelligence
i. BI in classical form is high-level oriented
ii. Process Intelligence focuses on operational performance to be transparent at all
times
iii. BPI comprises a large range of application areas spanning from process
monitoring and analysis to process discovery, conformance checking,
prediction and optimization.

iv. Challenge: Multiple actors need real-time information about business processes
that is tailored to their needs approach: integrating all data in an in-memory
database
Lecture 8, 9:
1. Explain different elements of a BI strategy
a. Organizational Adoption (process) involves all actions of individuals in an organization
that deal with creating awareness, selecting, evaluating, initiating and deciding for the
implementation of new ES technology.
b. Conversion (process) involves all actions of individuals (in an organization or across
organizations) that deal with developing and implementing a new ES technology.
c. Use (process) involves all actions of individuals in an organization that deal with using
and changing ES technology or the respective work system to realize intended business
value.

2. Characterize different types of key performance indicators and understand the importance of
alignment
a. KPI (Key Performance Indicator) Quarter/Year Motivation Index
b. OPI (Operational Performance Indicator) Weeks/Months Health rate
c. PPI (Process Performance Indicator) Hours/days Accidents
d. Pros of BI strategy: help align with business partners, formalize business needs, create
prioritized roadmaps with strategic business goals to deliver measurable results
e. Alignment is a continuous process
f. Dispositive (the data contained in data warehouses) Data Architecture: high level
reference model for all BI projects of an enterprise Starting from this high level model,
individual projects develop a concrete dispositive data model gap detection through
mapping to existing data sources
g. BI Governance: Data Monitoring Adapt entire BI technology stack (software, hardware,
networks) to technological change and increasing data volumes
h. Governance: Development Model
i. REALLY BAD s. 16 - 29
3. Understand the basics of BI implementation and the importance of BI Strategy for BI
Implementation
a. Disparate business data must be integrated need for information consolidation
alignment across organizations regarding master data and KPIs, new technology changes
a lot
i. Organizational Issues
ii. Project Issues
iii. Technical Issues
iv. User participation in the development is crucial
b. Common failure factors in BI projects
i. Unclear business / information objectives

ii. Low levels of data summarization: Getting lost in detail


iii. Lack of (Top) Management Support
iv. Lack of clear BI Strategy
v. Cultural issues being ignored
vi. Inappropriate architecture
4. Know a state-of-the-art project implementation methodology in principle
a. Best Practices for BI Implementation
i. Project fir to corporate strategy
ii. Complete buy-in of managers
iii. Manage user expectations
iv. DW should be built incrementally
v. Build in adaptability
vi. How? IT/Business manage together, training, political awareness, organizational
support, etc
b. Good scalability means that queries and other data-access functions will grow
linearly with the size of the system
c. Security focus on policies and procedures, logical security and restrictions,
limiting physical access, internal control with emphasis on security and privacy
5. Almost every kind of engineering project goes through six stages between inception and
implementation. Engineering projects are oftentimes iterative. Once deployed, products are
refined and improved. Each iteration produces a new release.
a. Always have a core team (100% available resources, permanent, diverse) and an
extended team + some executive representatives
b. Tracks run in parallel after the project requirements have been defined
i. ETL Track = back-end
ii. Application Track = front-end
iii. Metadata Track = bridge / navigation
c. Justification is about showing the balance between the costs involved and the benefits
gained (4 components)
i. Business analysis issues
ii. Risk assessment
iii. Cost-benefit analysis
iv. Business drivers (map to strategic business goals)

6. Characterize the individual stages and associated steps of the methodology in more detail slide
52

Will der wirkliche 16 verdammte Phasen einen BI Integration hren? Mir reichen die 6
Stages..alles andere wrde eher zu IS540 passen!
Lecture 10:
1. Explain the importance of post-adoption BI system use for the return on BI technology
investments
a. Returns on IS investments are mainly gained within the post-adoption stages!

2. Differentiate between two common post-adoption BI system use stages


a. Routinization: State in which IS use is integrated as a normal part of the
employees work processes
i. Repetitious work
ii. Perceived as a normal part of employees work activities
iii. Standardized work
iv. Incorporated into employees work processes
v. Employees develop familiarity with the implemented IS
b. Infusion: Embedding IS deeply and comprehensively in work processes
i. Realization of hidden value of an IS
ii. Extension of the IS (e.g., developing additional features)
iii. Infusion and Routinization do not necessarily occur in sequence but rather
occur in parallel
1. Employees can display either behavior at a precise point in time; but
they can also display both behaviors within a period of time
2. Both behaviors are expected to vary across employees

c.

Guest Lecture:
Really?

You might also like