Professional Documents
Culture Documents
RelatedQuestions
BigData Statistics(academicdiscipline) BookRecommendations Books
Whatarethebestbooksonstatisticsfordata
How do I learn statistics for data science? science?
Whatstatisticsbookdoyourecommendtoawannabedatascientistwhoisfamiliarwith
basicstatisticsandmathematics? WhatstatisticsshouldIknowtododatascience?
Whichisabettercareeroptionforsomeoneinterested
Answer Request Follow 514 Comment Share 1 Downvote
instatistics,probability&linearalgebra?Data
ScienceorMachineLearning?
PromotedbyEdureka
WherecanIfindsomegoodfreeresourcestolearn
Become a top Hadoop developer. Join interactive online course. statisticsfordatascienceandmachinelearning?
Instructor-led course with 24x7 support. Master HDFS, Mapreduce, Yarn, Pig, Hive,
Inordertolearnstatisticsfordatascienceclass,
HBase, Oozie & Flume. whichoneisbetterUdacity:IntrotostatisticsorKhan
Academy:ProbabilityandStatist...
LearnMoreatEdureka.co
Isagraduateoptimizationcoursegoodforstatistics
orfordatascience?
23Answers
WherecanIlearndatascience?
William Chen, studied Statistics at Harvard University (2014)
Howhaslearningcomputerscience,statistics,or
UpdatedJan16,2015UpvotedbyYassineAlouini,Iholdamastersinstatistics(formally
datasciencesingeneralimprovedyourunderstanding
partIII)fromCambridge.
andrateatwhichyouabsorbinformatio...
For any aspiring data scientist, I would highly recommend learning statistics with a
heavy focus on coding up examples, preferably in Python or R.
My favorite series is the Statistical Learning series. It's a great primer on statistical
modeling / machine learning with applications in R.
The Elements of Statistical Learning
QuestionStats
If you want something with a Python focus, I would check out Think Stats
514Followers
There are ocial pdf versions generously available for FREE at Bookmark 106,172Views
data mining, inference, and prediction. 2nd Edition. SuggestEdits LastAskedNov20
Edits
http://greenteapress.com/thinkstats2/index.html Report
Log
43.8kViewsViewUpvotesAnswerrequestedbyFadliHidayatandMinhazMishu
I wouldnt focus so much on learning statistics for data science, but more on just
learning statistics. Data Science itself is a combination of two elds,
statistics/mathematics and computer science. There were data scientists that sat at
the intersection of those two elds far before the term was coined.
Many of the answers above (which are great!) are targeted specically to machine
learning. In getting a broader perspective you gain the ability to not only implement
the models but understand how they connect and are related to the deeper
mathematics behind them as such, this post is more towards the general eld.
In terms of statistics that are immediately useful to data science, they typically fall
into one of two categories, either 1) inference or 2) model tting.
1) Parameter Estimation
2) Hypothesis testing
3) Bayesian Analysis
4) Identifying the best estimator
5) Other Statistical Theory
Some classic books on these topics include: 1
AskorSearchQuora AskQuestion Read Answer Notifications Achint
(more introductory): Statistical Inference: George Casella: 9788131503942:
Amazon.com: Books
(more advanced): Theory of Point Estimation (2nd English Edition): E.L. Lehmann, RelatedQuestions
George Casella: 9783698745156: Amazon.com: Books
Whatarethebestbooksonstatisticsfordata
2) In regards to model tting there are a multitude of topics: science?
WhatstatisticsshouldIknowtododatascience?
1) Linear Regression
2) Non-linear Regression Whichisabettercareeroptionforsomeoneinterested
3) Categorical Data Analysis instatistics,probability&linearalgebra?Data
ScienceorMachineLearning?
4) Time Series & Longitudinal Analysis
5) Machine Learning WherecanIfindsomegoodfreeresourcestolearn
statisticsfordatascienceandmachinelearning?
Some famous intro books include:
Inordertolearnstatisticsfordatascienceclass,
whichoneisbetterUdacity:IntrotostatisticsorKhan
Linear Models: Applied Linear Statistical Models w/Student CD-ROM: Michael H.
Academy:ProbabilityandStatist...
Kutner, John Neter, Christopher J. Nachtsheim, William Li: 9780071122214:
Amazon.com: Books Isagraduateoptimizationcoursegoodforstatistics
orfordatascience?
Categorical Data: Amazon.com: An Introduction to Categorical Data Analysis
(9780471226185): Alan Agresti: Books WherecanIlearndatascience?
Howhaslearningcomputerscience,statistics,or
3) Finally, there are also a variety of topics that are very helpful with things like datasciencesingeneralimprovedyourunderstanding
A/B testing, missing data, etc. andrateatwhichyouabsorbinformatio...
Many of the above you will encounter as you get through the 1) and 2) above.
1.3kViewsViewUpvotes
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second
Edition [1]
-Hastie Tibshirani, Friedman
26.3kViewsViewUpvotes Whichisabettercareeroptionforsomeoneinterested
instatistics,probability&linearalgebra?Data
Upvote 33 Downvote Comments 3+ ScienceorMachineLearning?
WherecanIfindsomegoodfreeresourcestolearn
Brian Feeny, Harvard Grad Student statisticsfordatascienceandmachinelearning?
WrittenDec16,2012UpvotedbyWilliamChen,studiedStatisticsatHarvardUniversity
(2014)andJustinRising,PhDinstatistics Inordertolearnstatisticsfordatascienceclass,
whichoneisbetterUdacity:IntrotostatisticsorKhan
There are many books that will focus on statistics as it applies to data science, Academy:ProbabilityandStatist...
however I do believe you should approach statistics holistically, and not just in the
Isagraduateoptimizationcoursegoodforstatistics
frame of reference of Data Science. For that, I recommend the following book:
orfordatascience?
Statistics, 4th Edition (9780393929720): David Freedman, Robert Pisani, Roger WherecanIlearndatascience?
Purves Howhaslearningcomputerscience,statistics,or
datasciencesingeneralimprovedyourunderstanding
This is the same book (loosely) followed by Andrew Conway in his Coursera course andrateatwhichyouabsorbinformatio...
Statistics One. I would try to nd the International version, as they are identical to
the US versions, but can be had for around $30.
The rst chapter or two are rather confusing, but I nd the rest of the book very well
laid out. Andrew Conway is very knowledgable in Statistics, and no doubt he has
recommended this book for good reason.
That said, I recommend using no single resource. Statistics is far too important to
Data Science. You must master it, and like most things, that is a constant work in
progress. I am addicted to Statistics, and I think this book is partially to blame.
15.7kViewsViewUpvotes
I stumbled across his notes while looking up some details regarding the Kolmogorov-
Smirnov testa non-parametric test (a non-parametric test is a test that doesn't
assume the data has any sort of probability distribution, and is thus "parameter"-free)
for dierences in two distributionsand found his notes to be incredibly lucidly
written and clear.
If you have some mathematical or technical maturity, you may nd his notes
similarly helpful in getting up to speed. If not, I still think his notes are a great initial
entry point into quickly getting a lay of the land.
The link is to his 6-7 notes, totaling ~70 pages, is here: Statistics for Research
Projects
Note that he doesn't have any notes on predictive modeling, which is a key part of
machine learning. I emailed him asking why, and he told me that he didn't have the
chance to write anything detailed for the topic. I'm considering drafting a short
primer myself...
13.6kViewsViewUpvotes 1
AskorSearchQuora AskQuestion Read Answer Notifications Achint
Upvote 24 Downvote Comment
RelatedQuestions
Shailesh Upadhyay, former Associate at Indian School of Business (2010-
2011)
Whatarethebestbooksonstatisticsfordata
UpdatedDec23UpvotedbyUjalaShanker,TaughtStatisticstoundergradstudentsat
science?
UCBerkeley.
OriginallyAnswered:HowdoIlearnstatisticsandprobabilityfordatascience? WhatstatisticsshouldIknowtododatascience?
To become a good data scientist, you need to build a strong foundation in the Whichisabettercareeroptionforsomeoneinterested
following: instatistics,probability&linearalgebra?Data
ScienceorMachineLearning?
Fundamental statistics (topics like descriptive & inferential statistics;
WherecanIfindsomegoodfreeresourcestolearn
parametric & non parametric tests, simple & multiple regression etc)
statisticsfordatascienceandmachinelearning?
Prociency with atleast one statistical computing language like R, SAS, Inordertolearnstatisticsfordatascienceclass,
STATA etc. Python programmers who have done data analysis also have an whichoneisbetterUdacity:IntrotostatisticsorKhan
edge. Academy:ProbabilityandStatist...
Algorithmic thinking- ability to think about and solve problems at a level of Howhaslearningcomputerscience,statistics,or
datasciencesingeneralimprovedyourunderstanding
abstraction that is beyond any specic programming language goes a long
andrateatwhichyouabsorbinformatio...
way.
Experience with large data sets & distributed computing using Hadoop/Hive
is an added advantage if you want to continue excelling as a data scientist.
A few online resources and moocs that can help you get started are:
1. Data Analyst (a good place to get a feel for data and practice)
1.5kViewsViewUpvotes
TopStoriesfromYourFeed