You are on page 1of 4

1

AskorSearchQuora AskQuestion Read Answer Notifications Achint

RelatedQuestions
BigData Statistics(academicdiscipline) BookRecommendations Books

Whatarethebestbooksonstatisticsfordata
How do I learn statistics for data science? science?
Whatstatisticsbookdoyourecommendtoawannabedatascientistwhoisfamiliarwith
basicstatisticsandmathematics? WhatstatisticsshouldIknowtododatascience?

Whichisabettercareeroptionforsomeoneinterested
Answer Request Follow 514 Comment Share 1 Downvote
instatistics,probability&linearalgebra?Data
ScienceorMachineLearning?

PromotedbyEdureka
WherecanIfindsomegoodfreeresourcestolearn
Become a top Hadoop developer. Join interactive online course. statisticsfordatascienceandmachinelearning?
Instructor-led course with 24x7 support. Master HDFS, Mapreduce, Yarn, Pig, Hive,
Inordertolearnstatisticsfordatascienceclass,
HBase, Oozie & Flume. whichoneisbetterUdacity:IntrotostatisticsorKhan
Academy:ProbabilityandStatist...
LearnMoreatEdureka.co

Isagraduateoptimizationcoursegoodforstatistics
orfordatascience?
23Answers
WherecanIlearndatascience?
William Chen, studied Statistics at Harvard University (2014)
Howhaslearningcomputerscience,statistics,or
UpdatedJan16,2015UpvotedbyYassineAlouini,Iholdamastersinstatistics(formally
datasciencesingeneralimprovedyourunderstanding
partIII)fromCambridge.
andrateatwhichyouabsorbinformatio...
For any aspiring data scientist, I would highly recommend learning statistics with a
heavy focus on coding up examples, preferably in Python or R.

My favorite series is the Statistical Learning series. It's a great primer on statistical
modeling / machine learning with applications in R.
The Elements of Statistical Learning

An Introduction to Statistical Learning MoreRelatedQuestions

QuestionStats
If you want something with a Python focus, I would check out Think Stats
514Followers
There are ocial pdf versions generously available for FREE at Bookmark 106,172Views
data mining, inference, and prediction. 2nd Edition. SuggestEdits LastAskedNov20

Page on usc.edu Thank 7MergedQuestions

Edits
http://greenteapress.com/thinkstats2/index.html Report

Log
43.8kViewsViewUpvotesAnswerrequestedbyFadliHidayatandMinhazMishu

Upvote 202 Downvote Comments 5+

Greg Ryslik, Led data science teams at Bay Area companies


WrittenJan27

I wouldnt focus so much on learning statistics for data science, but more on just
learning statistics. Data Science itself is a combination of two elds,
statistics/mathematics and computer science. There were data scientists that sat at
the intersection of those two elds far before the term was coined.

Many of the answers above (which are great!) are targeted specically to machine
learning. In getting a broader perspective you gain the ability to not only implement
the models but understand how they connect and are related to the deeper
mathematics behind them as such, this post is more towards the general eld.

In terms of statistics that are immediately useful to data science, they typically fall
into one of two categories, either 1) inference or 2) model tting.

1) In regards to inference that typically topics such as:

1) Parameter Estimation
2) Hypothesis testing
3) Bayesian Analysis
4) Identifying the best estimator
5) Other Statistical Theory
Some classic books on these topics include: 1
AskorSearchQuora AskQuestion Read Answer Notifications Achint
(more introductory): Statistical Inference: George Casella: 9788131503942:
Amazon.com: Books
(more advanced): Theory of Point Estimation (2nd English Edition): E.L. Lehmann, RelatedQuestions
George Casella: 9783698745156: Amazon.com: Books
Whatarethebestbooksonstatisticsfordata
2) In regards to model tting there are a multitude of topics: science?

WhatstatisticsshouldIknowtododatascience?
1) Linear Regression
2) Non-linear Regression Whichisabettercareeroptionforsomeoneinterested
3) Categorical Data Analysis instatistics,probability&linearalgebra?Data
ScienceorMachineLearning?
4) Time Series & Longitudinal Analysis
5) Machine Learning WherecanIfindsomegoodfreeresourcestolearn
statisticsfordatascienceandmachinelearning?
Some famous intro books include:
Inordertolearnstatisticsfordatascienceclass,
whichoneisbetterUdacity:IntrotostatisticsorKhan
Linear Models: Applied Linear Statistical Models w/Student CD-ROM: Michael H.
Academy:ProbabilityandStatist...
Kutner, John Neter, Christopher J. Nachtsheim, William Li: 9780071122214:
Amazon.com: Books Isagraduateoptimizationcoursegoodforstatistics
orfordatascience?
Categorical Data: Amazon.com: An Introduction to Categorical Data Analysis
(9780471226185): Alan Agresti: Books WherecanIlearndatascience?

Howhaslearningcomputerscience,statistics,or
3) Finally, there are also a variety of topics that are very helpful with things like datasciencesingeneralimprovedyourunderstanding
A/B testing, missing data, etc. andrateatwhichyouabsorbinformatio...

These include things like:

1) Design of Experiments (very helpful in A/B testing)


2) Bootstrapping (helpful when parameter of interest is hard to calculate)
3) Sample Size calculations (useful when trying to understand how many samples you
need)
4) Multiple comparisons (what happens if you run many tests)
5) A ton of others.

Many of the above you will encounter as you get through the 1) and 2) above.

If youre interested in a potential introductory syllabus, Ill be teaching a bootcamp


shortly. The course and syllabus is found here:

Statistical Foundations- Metis

Hope this helps!

1.3kViewsViewUpvotes

Upvote 8 Downvote Comment

Ferris Jumah, Data and Products


UpdatedJan19,2013UpvotedbyLiliJiang,DataScientistatQuora

Working list, please suggest edits, need classications

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second
Edition [1]
-Hastie Tibshirani, Friedman

Statistical Inference [2]


-Casella, Berger
--Excellent starting text for moving on to more advanced material

Bayesian Data Analysis [3]


-Gelman, Carlin, Stern, Rubin

Mining of Massive Datasets [4]


-Rajaraman, Ullman, Leskovec

All of Statistics [5]


-Wasserman

Also, for a very comprehensive list, see


What are some good resources for learning about statistical analysis?
1
AskorSearchQuora AskQuestion Read Answer Notifications Achint
[1] data mining, inference, and prediction. 2nd Edition. (download/buy)
[2] Statistical Inference: George Casella, Roger L. Berger: 9780534243128:
Amazon.com: Books RelatedQuestions
[3] Home page for the book, "Bayesian Data Analysis"
Whatarethebestbooksonstatisticsfordata
[4]Mining of Massive Datasets - The Stanford University InfoLab
science?
[5] All of Statistics: A Concise Course in Statistical Inference (Springer Texts in
Statistics): Larry Wasserman: 9780387402727: Amazon.com: Books WhatstatisticsshouldIknowtododatascience?

26.3kViewsViewUpvotes Whichisabettercareeroptionforsomeoneinterested
instatistics,probability&linearalgebra?Data
Upvote 33 Downvote Comments 3+ ScienceorMachineLearning?

WherecanIfindsomegoodfreeresourcestolearn
Brian Feeny, Harvard Grad Student statisticsfordatascienceandmachinelearning?
WrittenDec16,2012UpvotedbyWilliamChen,studiedStatisticsatHarvardUniversity
(2014)andJustinRising,PhDinstatistics Inordertolearnstatisticsfordatascienceclass,
whichoneisbetterUdacity:IntrotostatisticsorKhan
There are many books that will focus on statistics as it applies to data science, Academy:ProbabilityandStatist...
however I do believe you should approach statistics holistically, and not just in the
Isagraduateoptimizationcoursegoodforstatistics
frame of reference of Data Science. For that, I recommend the following book:
orfordatascience?

Statistics, 4th Edition (9780393929720): David Freedman, Robert Pisani, Roger WherecanIlearndatascience?

Purves Howhaslearningcomputerscience,statistics,or
datasciencesingeneralimprovedyourunderstanding
This is the same book (loosely) followed by Andrew Conway in his Coursera course andrateatwhichyouabsorbinformatio...

Statistics One. I would try to nd the International version, as they are identical to
the US versions, but can be had for around $30.

The rst chapter or two are rather confusing, but I nd the rest of the book very well
laid out. Andrew Conway is very knowledgable in Statistics, and no doubt he has
recommended this book for good reason.

That said, I recommend using no single resource. Statistics is far too important to
Data Science. You must master it, and like most things, that is a constant work in
progress. I am addicted to Statistics, and I think this book is partially to blame.

15.7kViewsViewUpvotes

Upvote 9 Downvote Comments 2

Carl Shan, reads a lot, has written a few


WrittenJul29,2015

To brush up on some basic statistics, without dropping a load of cash on a


textbook/degree, I'd like to suggest to start o by reading over a series of short primers
(10-12 page PDFs per topic) meant for the novice statistician, and social science
researcher written by MIT EECS PhD student Ramesh Sridharan.

He taught a 1-mo course at MIT for researchers brushing up on basic or intermediate


statistics, and uploaded all of his PDFs. (You can check out the website here: Statistics
for Research Projects )

I stumbled across his notes while looking up some details regarding the Kolmogorov-
Smirnov testa non-parametric test (a non-parametric test is a test that doesn't
assume the data has any sort of probability distribution, and is thus "parameter"-free)
for dierences in two distributionsand found his notes to be incredibly lucidly
written and clear.

If you have some mathematical or technical maturity, you may nd his notes
similarly helpful in getting up to speed. If not, I still think his notes are a great initial
entry point into quickly getting a lay of the land.

The link is to his 6-7 notes, totaling ~70 pages, is here: Statistics for Research
Projects

Note that he doesn't have any notes on predictive modeling, which is a key part of
machine learning. I emailed him asking why, and he told me that he didn't have the
chance to write anything detailed for the topic. I'm considering drafting a short
primer myself...
13.6kViewsViewUpvotes 1
AskorSearchQuora AskQuestion Read Answer Notifications Achint
Upvote 24 Downvote Comment

RelatedQuestions
Shailesh Upadhyay, former Associate at Indian School of Business (2010-
2011)
Whatarethebestbooksonstatisticsfordata
UpdatedDec23UpvotedbyUjalaShanker,TaughtStatisticstoundergradstudentsat
science?
UCBerkeley.
OriginallyAnswered:HowdoIlearnstatisticsandprobabilityfordatascience? WhatstatisticsshouldIknowtododatascience?

To become a good data scientist, you need to build a strong foundation in the Whichisabettercareeroptionforsomeoneinterested
following: instatistics,probability&linearalgebra?Data
ScienceorMachineLearning?
Fundamental statistics (topics like descriptive & inferential statistics;
WherecanIfindsomegoodfreeresourcestolearn
parametric & non parametric tests, simple & multiple regression etc)
statisticsfordatascienceandmachinelearning?

Prociency with atleast one statistical computing language like R, SAS, Inordertolearnstatisticsfordatascienceclass,
STATA etc. Python programmers who have done data analysis also have an whichoneisbetterUdacity:IntrotostatisticsorKhan
edge. Academy:ProbabilityandStatist...

Good knowledge/experience with advanced modeling techniques, such as Isagraduateoptimizationcoursegoodforstatistics


orfordatascience?
time series analysis, matrix factorization, mixed-eect models, and machine
learning techniques such as boosting and random forests. WherecanIlearndatascience?

Algorithmic thinking- ability to think about and solve problems at a level of Howhaslearningcomputerscience,statistics,or
datasciencesingeneralimprovedyourunderstanding
abstraction that is beyond any specic programming language goes a long
andrateatwhichyouabsorbinformatio...
way.

An understanding of how relational databases work. SQL experience helps.

Experience with large data sets & distributed computing using Hadoop/Hive
is an added advantage if you want to continue excelling as a data scientist.

A few online resources and moocs that can help you get started are:

1. Data Analyst (a good place to get a feel for data and practice)

2. Managing Big Data with MySQL - Coursera (learn using relation DB in


business analysis)

3. Practical Machine Learning - Coursera (a primer to start machine learning


intuitively)

Hope this helps.

1.5kViewsViewUpvotes

Upvote 12 Downvote Comments 1+

TopStoriesfromYourFeed

AnswerwrittenIndiaTopicyoumightlikeThu AnswerwrittenIndiaTopicyoumightlikeFeb Undiscoverednewanswer22m


22
Autorickshaw drivers are facing Where can I buy drugs online?
losses due to Uber and Ola. Is What do foreigners like about
Josjhua Litese
this fair? India?
Written22mago
Anna Stepanova, lives in Sakura Su We have Pain and anxiety meds of
Hyderabad, India WrittenFeb22
dierent types with no Prescription
WrittenThu I am trying to required. Prices are moderate and with
Ola and Uber have denitely saved lives of learn great relationship with our clients. We are
many foreigners in India. I know that Sanskrit. I American based underground vendor
Indians themselves suer from am very very with expli...
dishonesty of many auto drivers but with interested in
foreigners it is another level of hell. Indian
When... culture, especially the philosophy and
religions. Im trying hard to nd a way to
ReadInFeed get enlightened.ReadInFeed
I believe that perhaps ReadInFeed
Indian p...

You might also like