You are on page 1of 8

Harvard

University w Sociology E-161 w Spring 2016

Big Data: What is it?


with Burak Eskici
Tuesdays 7:40-9:30 pm
1 Story Street Room 306
Online Option is available

A tremendous amount of data is now being collected through websites, mobile phone
applications, credit cards, and many more everyday tools we use extensively. What is currently
done and what can we do with this precious resource? This big data course looks under the
hood. It explores the logic behind the complex methods used in the field (not the methods
itself). We then explore how big data research is designed with real life examples of cutting-
edge research and business applications. By the end of the class students will be competent in
the field and be able to conduct a research design using big data.

W1. (Jan 26) Introduction and Sociological Roots


W2. (Feb 2) Social Network Analysis I
W3. (Feb 9) Social Network Analysis II
W4. (Feb 16) Social Network Data and Visualization
W5. (Feb 23) Random Networks and Scale-free Networks
W6. (Mar 1) Big Data: Paradigm Shift?
W7. (Mar 8) Fitting a Model to Data
W8. (Mar 22) Machine Learning
W9. (Mar 29) Midterm Review
W10. (Apr 5) Similarity, Neighbors, and Clusters
W11. (Apr 12) Representing and Mining Text
W12. (Apr 19) Data Visualization
W13. (Apr 26) Big Data Applications I
W14. (May 3) Big Data Applications II
W15. (May 10) Ethics and Information Security

[Course Requirements] [Office Hours]


Mon 3:30 5 pm WJH 650
Weekly Readings eskici@fas.harvard.edu
Assignments (4) 36%
Midterm 24%
Final Paper 40%


1
[Course Objectives]
With the advancement in technology of storage capacity and computational power,
now it is possible to deal with huge amounts of data. Innovations in social network analysis
techniques and machine learning (big data) have opened up a new area of methodological
research. This course offers a review of these techniques in conceptual level, because they
offer new ways of research design for social scientists. In other words this course explores the
intersection of social network analysis, machine learning and social sciences. At the end of this
course we aim to answer these questions:
1. How do these techniques work? What is the logic behind them?
2. How can we use them to design our research/business and to find answers to our
questions?
3. With the help of these techniques, what kind of new research questions / business
opportunities can we seek?
This course has a hybrid characteristic in the sense of being a methods and a seminar class.
Some weeks we will be reading and discussing, and some weeks we will be learning the logic
behind those techniques.
What this course does not offer: This course is at the conceptual level. Our goal is to
develop an understanding of advantages that social network analysis and machine learning
techniques bring into social science research. However, we will not be learning coding or
execution of these techniques, which are covered by many courses offered by the Computer
Science and the Statistics Departments.
Target Audience: This course is designed for both graduates and undergraduates, mostly
social science majors, who are interested in social networks analysis, machine learning, and
social science research. There is no pre-requisite, but the familiarity with statistics (at Soc 156
or Stat 100,101,104 level) will be helpful to better grasp the advantages of these techniques
compared to regression analysis.

[Course Requirements]
Assignments (36%): There are four assignments. First two of them are going to be more
technical including short-answer questions and very short calculations. The last two
assignments will be short papers aiming to prepare students for the final paper.
Detailed prompts with questions will be provided. Papers are required to be 1200-1500
words long with proper citations.

Midterm Exam (24%): This is going to be a take-home, restricted time mid-term exam,
planned to be conducted at the end of March. Before the exam we will have a detailed
review.

2
Final Paper (40%): There are three options for the final paper assignment. The first one
is an argumentative essay in which I expect you to develop your argument related to
course subject, and support it with secondary source data. The second option is a
research proposal in which I expect you to propose a research question, write a short
literature review and explain what kinds of methods and data would be appropriate to
answer the question you proposed. The third option is an extended literature review,
in which I expect you to choose a topic related to course material and prepare a survey
of research conducted in that topic. The final paper is due on Saturday May 14th, and
expected to be 2500-3000 words long with proper citations

[Text and Readings]
The following books are required for the course.
Mayer-Schonberger, V. and Cukier, K. (2014), Big Data, Mariner Books, Boston, MA.
Provost, F. and Fawcett T. (2013), Data Science for Business, ORielly, Sebastopol,
CA.
I will provide the related chapters, but I strongly recommend you to have these books in your
personal library.
Additionally, we will be reading some chapters from the following books. If you are interested
in the subject, feel free to have them, because they are pretty good. These chapters are going
to be provided in the course website.
Christakis N.A. and Fowler, J.H., (2009), Connected: The Surprising Power of Our
Social Networks and How They Shape Our Lives, Little, Brown, and Company, New
York, NY.
Watts, Duncan J. (2004), Six Degrees: The Science of a Connected Age, Norton, New
York, NY.
For a more technical reference book (we will not cover technical details), you can check this.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Second
Edition) Trevor Hastie, Robert Tibshirani, and Jerome Friedman (2009) (PDF at
http://web.stanford.edu/~hastie/local.ftp/Springer/ESLII_print10.pdf)

[Contact]
My office hours will be from 3.30pm to 5pm on Mondays, at William James Hall room 650. I
prefer to meet with you in my office hours, but if you cannot make it please send me an email
to set up an appointment. I will be more than happy to meet you. My e-mail is
eskici@fas.harvard.edu

3
[Logistics and Some Notes]
Extension Policy: You must not allow yourself to get behind in this course, because the material
develops cumulatively. Topics covered early are foundations for those covered later on.
Assignments must be turned in on time if we are to provide prompt feedback. Credit therefore
will be deducted for late assignments. You will lose a 1/3 of a letter grade (e.g A- to B+) for each
day an assignment is late, unless extended.

Collaboration: Discussion and the exchange of ideas are essential to academic work. For
assignments in this course (with the exception of exams), you are encouraged to consult with
your classmates on the choice of paper topics and to share sources. You may find it useful to
discuss your chosen topic with your peers, particularly if you are working on the same topic as a
classmate. However, you should ensure that any written work you submit for evaluation is the
result of your own research and writing and that it reflects your own approach to the topic. You
must also adhere to standard citation practices in this discipline and properly cite any books,
articles, websites, lectures, etc. that have helped you with your work. If you received any help
with your writing (feedback on drafts, etc.) you must also acknowledge this assistance.

Plagiarism: Evidence of plagiarism or other forms of academic dishonesty will be dealt with
severely. All students are expected to read and comply with Harvards plagiarism policy
(http://isites.harvard.edu/icb/icb.do?keyword=k70847&pageid=icb.page355322 ).

Disability accommodations: Students needing academic adjustments or accommodations


because of a documented disability must present their Faculty Letter from the Accessible
Education Office (AEO) and speak with me by the end of the second week of the term,
February 7. Failure to do so may result in my inability to respond in a timely manner. All
discussions will remain confidential, although Faculty are invited to contact AEO to discuss
appropriate implementation.

4
Weekly Plan
[Some of the readings and weekly plans are subject to change depending on the pace we have in class]


W1. (Jan 26) Introduction and Sociological Roots
No readings are required.

W2. (Feb 2) Social Network Analysis I
Christakis, Nicholas A. (2010), The Hidden Influence of Social Networks, TED Talk -
http://bit.ly/1O7dnJe
Christakis, Nicholas A. and James H. Fowler (2009), Connected: The Surprising Power of
Our Social Networks, and How They Shape Our Lives, Chapter 1: In the Thick of It
Wasserman, Stanley and Katherine Faust (1994), Social Network Analysis: Methods and
Applications, Part 1, Chapter 1: Social Network Analysis in the Social and Behavioral
Sciences, pp. 1 28.

W3. (Feb 9) Social Network Analysis II
Borgatti, Stephen P. et al. (2009), Network Analysis in the Social Sciences, Science Vol.
323, pp. 892-895.
Butts, Carter T. (2009), Revisiting the Foundations of Network Analysis, Science vol.
325, pp. 414 416.
Wasserman, Stanley and Katherine Faust (1994), Social Network Analysis: Methods and
Applications, Part 1, Chapter 2: Social Network Data, pp. 28 56.
Wasserman, Stanley and Katherine Faust (1994), Social Network Analysis: Methods and
Applications, Part 3, Chapter 5: Centrality and Prestige, pp. 167 192.
Watts, Duncan J. (2004), Six Degrees: The Science of a Connected Age, Chapter 1:
Connected Age, pp. 19-37.


W4. (Feb 16) Social Network Data and Visualization
Recommended (not required)
o Historical Background
Freeman, Linton C. (2000), Visualizing Social Networks, Journal of
Social Structure, Vol:1.
o More Technical
Correa, CD and Kwan-Liu Ma (2011), Visualizing Social Networks, in the
book Social Network Data Analysis, edited by Charu C. Aggarwal,
Springer, NY. Chapter 11, pp. 307 326.

5
W5. (Feb 23) Random Networks and Scale-free Networks
Watts, Duncan J. (2004), Six Degrees: The Science of a Connected Age, Chapter 2: The
Origins of a New Science, pp. 43-68.
Watts, Duncan J. (2004), Six Degrees: The Science of a Connected Age, Chapter 3: Small
Worlds, pp. 69-100.
Barabasi, Albert-Laszlo and Eric Bonabeau (2003), Scale-Free Networks, Scientific
American, May 2003, pp. 50-59
Watts, Duncan J. (2004), Six Degrees: The Science of a Connected Age, Chapter 6:
Epidemics and Failures, pp. 162-194.


W6. (Mar 1) Big Data: Paradigm Shift?
Mayer-Schonberger, V. and Cukier, K. (2014), Big Data, Chapter 1 :Now, pp. 1-18.
McAfee, Andrew and Erik Brynjolfsson (2012), Big Data: The Management Revolution,
Harvard Business Review, October 2012, pp. 60-68.
Davenport, Thomas and D.J. Patil (2012), Data Scientist: The Sexiest Job of the 21st
Century, Harvard Business Review, October 2012, pp. 70-76.
Barton, Dominic and David Court (2012), Making Advanced Analytics Work for You,
Harvard Business Review, October 2012, pp. 78-83.
Provost, Foster and Tom Fawcett (2013), Data Science for Business, OReilly
Publications, CA, USA. Chapter 1: Introduction: Data Analytic Thinking, pp. 1-17.
Provost, Foster and Tom Fawcett (2013), Data Science for Business, OReilly
Publications, CA, USA. Chapter 2: Business Problems and Data Science Solutions, pp.
19-42.
Recommended
o Mayer-Schonberger, V. and Cukier, K. (2014), Big Data, Chapter 2 :More, pp.
19-31.
o Mayer-Schonberger, V. and Cukier, K. (2014), Big Data, Chapter 3 :Messy, pp.
32-49.
o Mayer-Schonberger, V. and Cukier, K. (2014), Big Data, Chapter 4 :Correlation,
pp. 50-72.


W7. (Mar 8) Fitting a Model to Data
Provost, Foster and Tom Fawcett (2013), Data Science for Business, OReilly
Publications, CA, USA. Chapter 4: Fitting a Model to Data, pp: 81 - 110.



6
W8. (Mar 22) Machine Learning
Provost, Foster and Tom Fawcett (2013), Data Science for Business, OReilly
Publications, CA, USA. Chapter 3: From Correlation to Segmentation, pp: 43 - 79.


W9. (Mar 29) Midterm Review
No readings are required.
Midterm on [April 2-3 Weekend]

W10. (Apr 5) Similarity, Neighbors, and Clusters
Provost, Foster and Tom Fawcett (2013), Data Science for Business, OReilly
Publications, CA, USA. Chapter 6: Similarity, Neighbors, and Clusters, pp: 141 - 186.

W11. (Apr 12) Representing and Mining Text
Provost, Foster and Tom Fawcett (2013), Data Science for Business, OReilly
Publications, CA, USA. Chapter 10: Representing and Mining Text, pp: 251 - 277.

W12. (Apr 19) Data Visualization
No readings are required


W13. (Apr 26) Big Data Applications I
Finance
Aldridge, Irene (2015), Trends: All Finance Will Soon Be Big Data Finance, Huffington
Post. http://www.huffingtonpost.com/irene-aldridge/trends-all-finance-will-
s_b_6613138.html (Links to an external site.)
Curme et al. (2014), Quantifying the semantics of search behavior before stock market
moves, PNAS, vol.111, no.32, pages 11600-05.
http://www.pnas.org/content/111/32/11600.full.pdf (Links to an external site.)
Nichols, Wes (2014), How Big Data Brings Marketing and Finance Together, Harvard
Business Review. https://hbr.org/2014/07/how-big-data-brings-marketing-and-finance-
together/ (Links to an external site.)
Marketing
Geraghty, Kevin (2012), The CMOs Guide to Big Data, 360i Report.
http://www.360i.com/reports/big-data/ (Links to an external site.)
Satell, Greg (2014), The Future of Marketing Combines Big Data With Human
Intuition, Forbes. http://www.forbes.com/sites/gregsatell/2014/10/12/the-future-of-
marketing-combines-big-data-with-human-intuition/ (Links to an external site.)
CSC Success Story, How Avis Budget uses Big Data in Marketing, Summary.
http://www.csc.com/big_data/insights/97741-
how_avis_budget_uses_big_data_in_marketing#summary (Links to an external site.)

7


W14. (May 3) Big Data Applications II
Health Care
Shah, Nilay D., J. Pathak (2014), Why Health Care may finally be Ready for Big Data,
Harvard Business Review. https://hbr.org/2014/12/why-health-care-may-finally-be-ready-
for-big-data (Links to an external site.)
Kayyali, B. et al. (2013), The Big-Data Revolution in US Health Care: Accelerating value
and innovation, McKinsey & Company Insights-Publications.
http://www.mckinsey.com/insights/health_systems_and_services/the_big-
data_revolution_in_us_health_care (Links to an external site.)
Salzberg, Steven (2014), Why Google Flu is a Failure,
http://www.forbes.com/sites/stevensalzberg/2014/03/23/why-google-flu-is-a-
failure/ (Links to an external site.)
Others
Sedghi, Ami (2013), Which countries are the most forward thinking?, The Guardian.
http://www.theguardian.com/news/datablog/2013/feb/08/countries-most-forward-
thinking-visualised (Links to an external site.)
Marr, Bernard (2015), Big Data: The Winning Formula in Sports.
http://www.forbes.com/sites/bernardmarr/2015/03/25/big-data-the-winning-formula-in-
sports/ (Links to an external site.)
Craig, Roger (2014), Big Data and the Future of Sports, USA TODAY.
http://www.usatoday.com/story/tech/columnist/2014/07/18/big-data-and-sports-roger-
craig-san-francisco-49ers-commentary/12599477/ (Links to an external site.)
Malik, Tarik (2015), Big Data in Governments Service to Citizen and State,
http://www.forbes.com/sites/teradata/2015/01/30/big-data-in-governments-service-to-
citizens-and-state/ (Links to an external site.) (Links to an external site.)
(Links to an external site.) Kim, Gang-Hoon (2014), Big Data Applications in the
Government Sector, Communications of the ACM, Vol. 57, No. 3, pages 78-85


W15. (May 10) Ethics and Information Security
Mayer-Schonberger, V. and Cukier, K. (2014), Big Data, Chapter 8: Risks.
Mayer-Schonberger, V. and Cukier, K. (2014), Big Data, Chapter 9: Control.
Boyd, D. and Kate Crawford (2012), "Critical Questions for Big Data", Information
Communication and Society, Vol. 15, No.5, pp. 662-679.
Crawford, K. et al. (2014), "Critiquing Big Data: Politics, Ethics, and Epistemology"
,International Journal of Communication, Vol. 8, pp. 1663-1672

You might also like