Professional Documents
Culture Documents
Introduction
Why we are interested in data science
- Solve problems and answer questions
- Gain useful insights through modeling to predict outcomes or discover
underlying patterns
What?
- General strategy that guides the processes and activities within a given
domain
- Does not depend on particular technologies or tools
- Not a set of techniques or recipes
- Provides the data scientist with a framework for how to proceed to obtain
answers
Methodology diagram
Business
Understanding
Analytic
Approach
Data
Requirements
Feedback
Data Collection
Deployment
Data
Understanding
Evaluation
Modeling
Data
Preparation
Business understanding
Business
Understanding
Analytic approach
Analytic
Approach
Data compilation
The chosen analytic approach determines the
data requirements.
- Content, formats, representations
Data
Requirements
Data Collection
Data
Understanding
Data preparation
Data preparation encompasses all activities to construct the data set.
- Data cleaning
Missing or invalid values
Eliminating duplicate rows
Formatting properly
- Combining multiple data sources
- Transforming data
- Feature engineering
- Text analysis
Data
Preparation
Modeling
Modeling focuses on developing models.
- Predictive or descriptive models
- According to the previously-defined analytic approach
- Training set for predictive modeling
Modeling
Model evaluation
Model evaluation is performed during model development and before
model deployment.
- Understand the models quality
- Ensure that it properly addresses the business problem
Diagnostic measures
- Suitable to the modeling technique used
- Testing set
- Refine model as needed
Evaluation
10
Marketing
Application developers
IT administration
Deployment
12