You are on page 1of 6

INDIAN STATISTICAL INSTITUTE

SQC&OR UNIT, PUNE


www.sqcpune.org
DATA ANALYTIC CERTIFICATION PROGRAM
FOR 54 DAYS OF CLASSROOM STARTING FROM 6 JULY-15 TO 28 OCT-15, EVENING SESSIONS

It is a unique institution, devoted to the research, teaching & application


of not only statistics and allied sciences, but also the natural sciences,
social sciences and their interface with the statistics. Founded by Prof. P.
C. Mahalanobis, a Physicist turned Statistician; the Institute has been
accorded the status of an INSTITUTE OF NATIONAL IMPORTANCE,
by an Act of Parliament, 1959.

Yesteryears trend suggests that the analysis of data have been more in improving operational
processes like manufacturing, service provision, quality control and the likes. Literatures of data
analysis are also tuned with the orientation to serve the purpose of operational excellence focused
towards more of internal processes in the entire business processes. However, every businessman
knows how important are the external influences on their processes and available analytical ability
hardly succeeded to address the external concerns. Due to this, decisions to tackle external
influences are often taken on intuition and perceptions. Many of such decisions often went
counterproductive.
Current practices of developing business analytic to gather business intelligence are more focused
towards tabular analysis and data visualization and less of even basic data analysis and scientific data
mining activities requiring good depth of univariate and multivariate statistical techniques. Already
known conclusions and wishful conclusions are presented in visual graphs and charts in many of the
routinely exercised business analytic reports using quite costly `customized software! Discover
unknown information from the available data, predict future with accuracy, and manage the
knowledge of external environment impacting your business is a few key deliverables of business
analytic. ISI, Punes Data Analysts and Miners are here to support you to reach out to these
deliverables.
Who can enrol?
Manager and above who would need to extract information from data, they routinely encounter
from the areas such as Retail, Banking, BPO, IT, Sales & Marketing, Supply chain, Strategic
management etc. We strongly recommend the participants to come from Mathematical and
statistical background with acquaintance of data analysis software such as Minitab, JMP, R, SAS etc.
Certification Requirements:
Data Analyst Certificate and ID Card will be issued after successful completion of Content Test and a
Test of Presentation skill with >=70% marks in the test conducted on the last day of the session.

Topic

Description

Overview

Basic
Statistical
concepts

Exploratory
Analyses
visualization
techniques

Data
and

Estimation
and
hypotheses testing

Introduction to data mining and business analytics current status and examples
Types of problems encountered with examples; concepts of data generation
process
Three pillars statistical and machine learning; big data and data warehousing
technology; managerial usages, applications, and competing with analytics
Concepts of modeling and problem formulation; formulating right questions
Scope and objectives of this course
Quiz, tests and exercises on problem identification and classification
Concepts of experiments, outcomes, sample space, events and probability models
Concepts of conditional probability, independence and Bayes theorem
Concepts of lift, support, sensitivity and specificity and their illustrations in
Business Analytics
Formulating real life problems using probability models illustrations and
exercises
Concepts of random variables, data generation process, sample and population,
parameters and inferential problems
Distribution function, probability function and their usage
Some important distributions
Expectation, variance and other moments of a distribution; interpretation and
usage
Bivariate, multivariate, conditional and marginal distributions with illustrations
Conditional expectations and variances and their usages with illustration
Concepts of correlation and regression from the perspective of bivariate
distribution
Functions of random variables
Introduction to Markov chains and their usage in describing a system
Concepts of simulation
System modeling using concepts of random variables and simulation
Exercises describing and modeling systems in terms of random variables
Meaning of a descriptive approach to a quantitative problem
Concepts of exploratory analyses description, summarization and hypotheses
formulation; concepts of data generation processes
Frequency distributions, histograms, bar diagrams and its variants, pie diagrams
and its variants, spider charts, illustrated maps, dot plots, box plots, run charts,
tables, ogives and comparison of distributions
Univariate and bivariate summary measures averages, percentiles, measures of
variation like standard deviation / range / inter quartile range; correlation and
covariance; measures of skewness and kurtosis and their meanings;
Exploring relationship between two or more variables scatter plots, mean
functions, contingency tables; multi-way tables; concepts and preliminary
measures of interactions and their graphical presentations; summary measures of
relationship of ordinal variables, measures based upon concordance and
discordance; concepts of odds ratios, relative risks and effect modification
Interpretation exercises and case examples
Exercises on formulating problems from a quantitative perspective and providing
descriptive solutions
Estimation as data summarization and inferential tool; difference between
descriptive measures and model based estimation; usage in business analytics

with illustrations
Estimation methods concepts of likelihood; properties of estimators unbiased
estimator, efficiency and consistency; concepts of sampling distribution and
confidence intervals; common sampling distributions and discussions on how they
arise and their properties
Estimation of sample means, proportions, rates and variances different
techniques of estimation and their relative merits / demerits
Concepts of prior and posterior distributions; conjugate prior distributions; Bayes
estimators; concepts of credible intervals
Real life examples and exercises on estimation; understanding where and how to
use estimation
Meaning and formulation of hypotheses in real life business scenario
Examples and comparison with estimation, general problems of description and
problems of formulation and testing of hypotheses testing
Concepts of test statistics and reference distributions
Concepts of type I and type II errors; concepts of confidence and power
Testing equality of means, variances, rates and proportions one and two sample
tests
Concepts of non-parametric tests including permutation tests
Contingency tables and their usages for testing hypotheses of independence;
Simpsons paradox and effort modification
Tests of goodness of fit; goodness of fit for composite hypotheses; fitting
distributions chi-square tests and graphical methods; Kolmogorov Smirnov tests
Non parametric tests for means, proportions and rates including sign and rank
tests
Applications including A / B tests, champion / challenger tests and paired t tests
Testing equality of several means, variances and proportions; concepts of multiple
comparisons
Exercises on describing business problems using estimates and summary
measures
Exercises on formulating and testing hypotheses
Case examples and quiz
Concepts of linear
models

Introduction to linear models and their usages; Introduction to the concepts of


least squares and their usages; Introduction to ANOVA and ANCOVA one way
and two-way models

Exercises combining concepts of linear models, ANOVA / ANCOVA, multiple


comparisons, hypotheses formulation and descriptive statistics / concepts of
estimation
Multiple
Regression

Linear

Problems of value estimation;


Introduction to multiple linear regression and its usages;
Fitting, testing, validating and interpreting multiple linear regression models
basic steps of model building; fitting models with categorical variables and
interaction terms
Understanding assumptions, limitations and pitfalls; selecting models using mean

Categorical
Analyses

Data

Introduction
to
Logistic Regression
for Classification and
Risk Analysis

Regression Analysis
Special Topics

Classification using
Linear and Quadratic
Discriminant
Analysis
Classification using
Support
Vector
Machines

Tree Based Methods


for classification and

functions and other techniques


Testing model significance and adequacy; residual analyses to test model
adequacy and accuracy; re-sampling methods for model validation;
Interpreting model coefficients; proposing general linear hypotheses to enhance
understanding;
Model improvement usage of stepwise regression; best subset selection; using
non-linear terms in the model
Taking care of multi collinearity using Ridge regression; concepts of lasso and
tuning parameter
Real life examples and exercises; exercises combining descriptive / exploratory
techniques, estimation / testing of hypotheses and fitting models
Pervasive nature of categorical data analyses for classification and study of
dependence in business, economics and technology
Review of contingency tables and summary measures like odds ratio and relative
risks; other measures of relationship; usage of log odds in business analytics
Introduction to various types of sample design follow up studies, prospective
and retrospective studies, case control studies; observational vs. experimental
studies clinical trials; types of samples poisson, multinomial, hypergeometric;
relative merits and demerits; limitations of different sample designs
Comparing rates and proportions in different situations; taking care of ordinal
variables in contingency tables; analyzing multiple contingency tables
Introduction to log linear analysis
Case examples
Problems of classification in business, economics and technology
Logistic regression concepts of risk and sigmoid function; logit transformation;
usage of log odds; fitting and testing logistic models; interpretation of
coefficients; testing model adequacy concepts of deviance and other tests of
goodness of fit; understanding model usage including when classification is
permissible; checking classification accuracy the pros and cons of checks;
residual analyses; using logistic regression in real life situations
Special cases of logistic regression multinomial and ordinal logistic regression
Introduction to count regression and its variants including zero inflated models
Introduction to censored data
Concepts of survival analysis and introduction to estimation of survival function
using Kaplan Meier method usages in business (insurance, health care etc.)
Introduction to proportional hazard model (Cox regression) model fitting and
validation
Introduction to GLM with special reference to dependent variable following
gamma distribution
Introduction to linear and quadratic discriminant analysis
Bayes optimality criteria; using Bayes theorem for classification; linear
discriminant analyses; quadratic discriminant analysis
Support Vector Machines Maximum margin classifier; support vector machine
classifiers; classification with non-linear boundaries; SVM with more than two
classes one-versus-one and one-versus-all classification; assessment of SVM
ROC curves; relationship with logistic regression
Introduction to tree based methods for classification and value estimation
Mean squared errors and their usage in value estimation using tree based

value estimation

Artificial
Networks

Neural

Clustering Problems

Principal Component
Analysis and Factor
Analysis

methods
Measures of entropy / information and their usage in the context of tree based
classification
Construction and pruning of trees concepts of cost complexity pruning
Comparison of tree based methods with other statistical models; advantages and
disadvantages of trees;
Usage of bagging, random forests and boosting to improve performance of trees
Case examples
Exercises and quiz
Introduction
Projection pursuit regression
Fitting neural networks
Some issues including starting values, over fitting, scaling of inputs, number of
hidden units and layers, multiple minima
Case examples, exercises and quiz
Unsupervised learning and usage of clustering problems
Hierarchical and nonhierarchical clustering
Selecting clustering variables
Measuring similarity various measures of distances
Detecting outliers
Concepts of linkage methods and their advantages and disadvantages
K-means clustering
EM and latent variable clustering
Assessing goodness of cluster solutions
Case examples and exercises
Introduction to principal component analysis (PCA) and its usages dimension
reduction using PCA, development of indices using PCA
Introduction to factor analysis concepts of latent variables and constructs, factor
models, concepts of scale development, introduction to EFA and CFA
Computation of correlations for categorical variables and preliminary assessment
for fitting factor models
Fitting and validating factor models

Examples of Factor Analysis and its usages


Exercises and quiz
Introduction
to
Concepts of time series concepts of trends, seasonality, cycles and random
forecasting
using
variation
time series
Concepts of moving average and their usage in forecasting
Introduction to exponential and double exponential smoothing procedures; usage
of these models for forecasting
Other forecasting models Winters model, Holt Winter model
Forecasting using univariate Box-Jenkins (ARIMA) models
o Concepts of stationary series, testing whether a series is stationary,
converting a non-stationary series to stationary series
o Identification and taking care of seasonality
o Concepts of autocorrelation (ACF) and partial autocorrelations (PACF);
o Model selection using sample ACF and PACF;
o ARIMA models and notations
o Fitting and validating models;
o Using ARIMA models for forecasting
Special topics
Usage of analytics techniques for specific purposes like customer analytics, HR

analytics, fraud analytics, text mining and sentiment analysis and other important
areas
Schedule
54 Days Program:
Month
Jul-15

Aug-15

Sep-15

Oct-15

Date
6,7,8
13,14,15,20,
21,22,27,28,
29
3,4,5,10,11,
12,17,18,19,
26,27,28
1,2,3,7,8,9,
14,15,16,21,
22,23,28,29,
30

Timing
18:00 to
20:00
18:00 to
21:00
18:00 to
21:00
18:00 to
21:00

5,6,7,12,13,14 18:00 to
15,19,20,21,
21:00
26,27,28

Location
The Central Park Hotel, Near Inox Multiplex, Bund Garden Road, Pune, Maharashtra 411001
Phone:020 4010 4000
Course Fee
INR 1,50,000/- per participant. (Plus service tax @14% ie., INR 21,000) Totalling to INR 1,71,200/-.
Participant would have to pay by cheque favouring INDIAN STATISTICAL INSTITUTE, payable at Pune.
Note :
1. The above fee is also inclusive of Course Material, Breakfast, Lunch and Refreshments.
2. For registration procedure please refer to the registration form available at
http://www.sqcpune.org/training-programs/training-calendar or email us for the same
Contact Details
Website: www.sqcpune.org
Email: srath02@yahoo.com
Mobile: 09371058816 (Prof. Subrata Rath)/ 09960956118 (Mr. Sharad S. Shende).

You might also like