You are on page 1of 7

Workshop for Ecologists Program schedule March 7-13, 2018

Day 1 (March 7, Wednesday)


Time Topics
Introduction to R and RStudio
 Installation and different options in RStudio, customizing the environment
 Installing Packages (install.packages), loading packages (require,
library), project, workspace
 Looking into help files
DAY – I (Session – A)
 Familiar with options (setwd, getwd, sessionInfo)
09:30 AM – 11:00 AM
 Document generation options
 Get the problems resolved in individual laptops so that next six days sessions runs
smoothly
 Data import and export (read.table, read.csv, write.table,
write.csv)
11:00 AM – 11:30 AM Tea Break
Handling data in R
 Basic data types: numeric, integer, logical, character
 Data storage facilities in R (c(), vector, matrix, data.frame,
list, rep, numeric, seq)
DAY – I (Session – B)  Difference between matrix and data.frame
11:30 AM – 01:00 PM  Subsetting and modifying data (select, subset, filter, which,
reshape)
 Combining data (rbind, cbind, merge)
 Some useful functions (is.na, dim, complete.cases, summary,
aggregate, class)
01:00 PM – 02:30 PM Lunch Break
Introduction to basic R programming
 The basic syntax for programming in R (function)
DAY – I (Session – C)
 Create basic functions (Check tutorial examples)
02:30 PM – 04:00 PM
 Utility of writing functions (Toy examples for ecologists)
 Loops and control flow statements and basic examples (for, if)
04:00 PM – 04:30 PM Tea Break
DAY – I (Session – D) Tutorial session
04:30 PM – 06:00 PM  Data manipulation exercises (Tutorial – IB)
 Writing own functions (Tutorial – IC)

Expected Outcome (Day – I):


 Participants should be confident using the RStudio framework
 Participants should be able to import data and perform basic data manipulation
 Participants should be able to write basic functions

1
Workshop for Ecologists Program schedule March 7-13, 2018

Day 2 (March 8, Thursday)


Time Topics
DAY – II (Session – 0)
Doubt session: Doubts from day-I will be discussed
09:00 AM – 09:30 AM
Statistical distribution (using R)
 Visualizing data distributions: Histogram, boxplot (hist, boxplot, plot)
 Probability mass functions and Probability density functions
 Area under the curve (Integrate)
 Binomial distribution, Uniform distribution, Normal distribution, t- distribution
DAY – II (Session – A)
(rbinom, runif, rnorm, dnorm, qnorm, pnorm)
09:30 AM – 11:00 AM
 Skewed distribution, descriptive measures (mean, median, mode,
quantile, range, sd, kurtosi, skew, describe, min, max,
var)
 Strong emphasis on visualizing distribution functions
 Important packages: Hmisc, psych
11:00 AM – 11:30 AM Tea Break
Testing of hypothesis (using R)
 Concepts: type-I error, type-II error, p-value, test statistic, confidence intervals
 Tests: z-test, t-test, F-test, paired t-test (lower tail, upper tail, two tailed), test for
DAY – II (Session – B) population proportions
11:30 AM – 01:00 PM
 ANOVA, Non-normal data, Checking for normality permutation test (aov,
multcomp, qqnorm, qqline)
 Case studies from ecology
01:00 PM – 02:30 PM Lunch Break
Statistical Methods in Ecology: General approaches and recent trends
DAY – II (Session – C)
(Lecture by an expert covering different areas of ecology, current focus, whether there
02:30 PM – 04:00 PM
is any shifting in general approaches, pitfalls)
04:00 PM – 04:30 PM Tea Break
Model building in R – I (Simple Linear Regression)
 Train and test partition of data (sample, sample.split)
 Simple Linear Regression (Concepts, Assumptions, Understanding R outputs)
(lm, summary)
 Correlation (cor, corr.test)
 Checking model assumptions and associated plots (boxcox)
DAY – II (Session – D)
04:30 PM – 06:00 PM  Regression diagnostics and associated plots (influential points, high-leverage
points, non-constant variance or error, correlation of errors, possible nonlinearity)
 Prediction using model for new data
 Measures of Model accuracy (R2, Adjusted R2, AIC, BIC, Mallow’s Cp)
 Important package: MASS,car
 Important functions: outlierTest, qqPlot, leveragePlots,
influencePlot, ncvTest

Expected Outcome (Day – II):


 Participants should be able build linear regression models for a given data set
 Participants should be able to identify the important variables and select the best model
 Participants should be able to make prediction on new data

2
Workshop for Ecologists Program schedule March 7-13, 2018

Day 3 (March 9, Friday)


DAY – III (Session – 0)
Doubt session: Doubts from day-II will be discussed
09:00 AM – 09:30 AM
Tutorial session
DAY – III (Session – A)
 Statistical Distribution (Tutorial – IIA)
09:30 AM – 11:00 AM
 Testing of hypothesis (Tutorial – IIC)
11:00 AM – 11:30 AM Tea Break
Model building in R – II (Multiple Linear Regression)
 Model building with more than one explanatory variables
 Variable selection, exploring correlations (corplot, pairs)
 Comparing models (anova)
 Forward, Backward and Mixed selection (regsubsets), selection criteria
DAY – III (Session – B)
 Prediction for new data using the best model (predict, predict.lm),
11:30 AM – 01:00 PM
confidence interval and prediction interval
 Regression diagnostics and checking model assumptions (Multicollinearity,
Variance Inflation Factor, vif) (connect - Session IID)
 Important Packages: leaps, caret, corrplot, ISLR, MASS
 Useful functions: createDataPartition, steAIC
01:00 PM – 02:30 PM Lunch Break
Tutorial session (In groups)
DAY – III (Session – C)
 Model building in R (Tutorial – IID)
02:30 PM – 04:00 PM
 Testing of hypothesis (Tutorial – IIIB)
04:00 PM – 04:30 PM Tea Break
Model building in R – II (Logistic Regression)
 Building model for categorical response in R (glm)
 Confusion matrix, Sensitivity, Specificity ROC curve, AUC (table,
confusionMatrix)
DAY – III (Session – D)
 Variable selection in logistic regression model
04:30 PM – 06:00 PM
 Prediction on new data (predict)
 Model diagnostics
 Logistic regression for more than two classes
 Important package: pROC, ROCR

Expected Outcome (Day – III):


 Participants should be able build logistic regression models for a given data set
 Participants should be able to identify the important variables and select the best model

3
Workshop for Ecologists Program schedule March 7-13, 2018

Day 4 (March 10, Saturday)


DAY – IV (Session – 0)
Doubt session: Doubts from day-III will be discussed
09:00 AM – 09:30 AM
Model building in R –IV
 Polynomial Regression (ISLR, poly, I(x^2), anova)
DAY – IV (Session – A)
 Statistical Regularization (Ridge, Lasso)
09:30 AM – 11:00 AM
 Important packages: ISLR
 Useful functions: cv.glmnet
11:00 AM – 11:30 AM Tea Break
Tutorial Session
DAY – IV (Session – B)
 Logistic Regression, Tutorial - IIID
11:30 AM – 01:00 PM
 Statistical Regularization, Tutorial – IVA
01:00 PM – 02:30 PM Lunch Break
Moving beyond linearity (Nonlinear Regression)
 Building nonlinear models in R (nls)
 Making predictions using nonlinear model
DAY – IV (Session – C)  Mode diagnostics (Mean structure, variance homogeneity, independence or
02:30 PM – 04:00 PM error)
 Remedies for model violations
 Delta method for estimating derived parameters (delta.method)
 Important Packages: MASS, nls2, nlstools, msm
04:00 PM – 04:30 PM Tea Break
Resampling methods (Model selection)
 Validation set approach
DAY – IV (Session – D)  Cross validation (Loocv, K-fold)
04:30 PM – 06:00 PM  Bootstrap, Jacknife (boot)
 Bootstrapping regression model (alr3, alr4)
 Model complexity versus model interpretation

Expected Outcome (Day – IV):


 Participants should be able to understand the utility of different resampling methods
 Participants should be able to apply the nonlinear least squares method to choose a nonlinear mathematical
model

4
Workshop for Ecologists Program schedule March 7-13, 2018

Day 5 (March 11, Sunday)


DAY – V (Session – 0)
Doubt session: Doubts from day-IV will be discussed
09:00 AM – 09:30 AM
Tutorial Session
DAY – V (Session – A)
 Nonlinear Regression, Tutorial - IVC
09:30 AM – 11:00 AM
 Use of resampling methods, Tutorial – IVD
11:00 AM – 11:30 AM Tea Break
Some advanced graphics options
DAY – V (Session – B)  Some packages on visualization (ggplot2, lattice)
11:30 AM – 01:00 PM  Tutorial problems of visualization
 A revision of previous four days
01:00 PM – 02:30 PM Lunch Break
DAY – V (Session – C)
Workshop Tour
02:30 PM – 04:00 PM
04:00 PM – 04:30 PM Tea Break
DAY – V (Session – D)
Workshop Tour
04:30 PM – 06:00 PM

Expected Outcome (Day – V):


 Participants should be able to use generalized additive models for predictions
 Participants should be able to understand the basic structure of multivariate data

5
Workshop for Ecologists Program schedule March 7-13, 2018

Day 6 (March 12, Monday)


DAY – VI (Session – 0)
Doubt session: Doubts from day-V will be discussed
09:00 AM – 09:30 AM
Tutorial Session
DAY – VI (Session – A)
 GAM for regression problems
09:30 AM – 11:00 AM
 GAM for classification problems
11:00 AM – 11:30 AM Tea Break
DAY – VI (Session – B) Tutorial Session
11:30 AM – 01:00 PM  Generalized additive model, Tutorial-VIA
01:00 PM – 02:30 PM Lunch Break
DAY – VI (Session – C)
Numerical Ecology with R
02:30 PM – 04:00 PM
04:00 PM – 04:30 PM Tea Break
Multivariate Data Analysis using R (Introduction)
 Multivariate data visualization, Matrix scatter plot (scatterplotMatrix)
 Mean vectors, covariance matrix
DAY – VI (Session – D)
 Exploratory Data Analysis
04:30 PM – 06:00 PM
 Species data, environmental data, transformation
 Distance measures, dissimilarity matrix
 Important package: car

Expected Outcome (Day – VI):

 Participants should be able to understand the structure of multivariate data and chose appropriate
visualization techniques for graphical demonstration
 Participants should be able to use R to perform principal component analysis on the appropriate data

6
Workshop for Ecologists Program schedule March 7-13, 2018

Day 7 (March 13, Tuesday)


DAY – VII (Session – 0)
Doubt session: Doubts from day-VI will be discussed
09:00 AM – 09:30 AM
Multivariate Methods
DAY – VII (Session – A)  Principal Component Analysis (prcomp, predict, eig, dudi.pca)
09:30 AM – 11:00 AM  Unconstrained ordination, visualization
 Interpretation of results
11:00 AM – 11:30 AM Tea Break
Cluster Analysis
 Hierarchical Clustering based on links (hclust)
DAY – VII (Session – B)  Cluster Dendogram
11:30 AM – 01:00 PM  Interpreting and comparing cluster analysis results
 Optimal number of clusters (silhouette(), cluster)
 Heatmap and community table
01:00 PM – 02:30 PM Lunch Break
DAY – VII (Session – C)  Factor analysis (loadings, factor rotation, Communality)
02:30 PM – 04:00 PM  (ellipse, FactoMineR, MASS, factanal)
04:00 PM – 04:30 PM Tea Break
Summary of the workshop
DAY – VII (Session – D)
 Session-wise discussion
04:30 PM – 05:15 PM
 Topics that could not be covered!
Valedictory session
DAY – VII (Session – E)  Feedback from the participants (Oral and feedback form collection)
05:15 PM – 06:00 PM  Certificate distribution
 Message from the coordinator

References:
1. Numerical Ecology with R, Borcard, D., Gillet, F. and Legendre, P. Springer, 2011.
2. A Primer of Ecology with R, Henry, M. and Stevens, H. Springer, 2009.
3. Introduction to Statistical Learning with Applications in R, James, G. Witten, D., Hastie, T. and Tibshirani,
R. Springer, 2013.
4. Ecological Models and Data in R, Ben Bolker, Princeton University Press, Oxford, 2007.
5. The R Student Companion, Brian Dennis, CRC Press, Taylor and Francis Group, 2013.
6. New Statistics with R: An Introduction for Biologists, Andy Hector, Oxford University Press, 2015.

You might also like