You are on page 1of 2

Summary of Coursera Process Mining.

Week 1 Reading 5: Aalst (2011) Process Mining Chapter 5: Process

Discovery: An introduction
Created by Ariadna73 using a beta version of EASY+

(Please click, and share the video. Thank you, Ari!)

Problem Statement
Process discovery is one of the most challenging Process Mining tasks
This chapter shows how to use a -algorithm
Process Discovery
Focus: Discovery perspective
A process discovery algorithm is a function that maps an event Log L into a process model that
represents it
The challenge is to find such algorithm
Quality criteria for "being representative"
- Fitness: the discovered model should reflect the log
- Generalization: The discovered model should generalize the log
- Simplicity: The model should be as simple as possible
Focus: Control flow Perspective

A simple algorithm for Process Discovery


Basic idea
Input: Event log (L)
The algorithm scans L looking for patterns
Algorithm
Consists of eight steps
1. Check which activities appear in the log
2. Identify the start activities
3. Identify the end activities
4. and 5. Are the core of the algorithm: Identify the nodes in the net
6. Identify a unique source place and a unique sink (End) place
7. Generate the arcs in the net
8. Not explained
Limitations of the -algorithm
Assumes that the log is complete (this is often not the case)
It can generate loops
Frequencies are ignored, so the algorithm is very sensitive to noise and incompleteness
Taking the Transactional Life Cycle into account
The log can be partitioned into smaller logs related to a specific activity
Information about general transactional life cycle, or about activity-specific transactional life
cycle can be exploited

Rediscovering process models


We do not really know the real models, we have to discover them
We can create experimental settings for discovering the process

Challenges
Representational bias
Any discovery technique has its own bias
Process discovery is, by definition, restricted by the expressive power of the target language
Noise and incompleteness
Noise: Bare and infrequent behavior contained in the log
Incompleteness: Too few events in the log
____________________________________________________________________________________
Created by Ariadna73 using a beta version of EASY+ (Please click, and share the video. Thank you, Ari!)
Page 1 of 2

Summary of Coursera Process Mining. Week 1 Reading 5: Aalst (2011) Process Mining Chapter 5: Process

Discovery: An introduction
Created by Ariadna73 using a beta version of EASY+

(Please click, and share the video. Thank you, Ari!)

Strategy to deal with this: Cross validation = Split the log in a zone for testing and another zone
for developing
Four competing quality criteria
Fitness: When the model can replay all the instances in the log
Simplicity: Occam's Razor: Create the simplest possible model
Precision: Does not allow for "too much" behavior
Generalization: Does not restrict behavior to the examples seen in the log
Taking the right 2-D slice of a 3-D reality
Specific problem: Process mining can't generate "negative examples" (by definition, a log shows
what happened, not what did not happen)
Problem: The model typically contains only 9 fraction of all possible behaviors (in other wards"
anything can happen)
Problem: There is no clear relation between the size of a model and its behavior

____________________________________________________________________________________
Created by Ariadna73 using a beta version of EASY+ (Please click, and share the video. Thank you, Ari!)
Page 2 of 2

You might also like