Professional Documents
Culture Documents
http://computational-communication.com/globe/ashoka.html
6/10/15
On coding
http://tech.co/popular-languages-github-infographic-2015-04
6/10/15
http://githut.info/
6/10/15
http://tech.co/popular-languages-github-infographic-2015-04
6/10/15
C++ C Pascal
Java/C#
Python
Perl
VB
PHP, Lisp
http://www.memecenter.com/fun/32267/if-programming-languages-were-tools
6/10/15
C M1
C++
Perl
Java
JavaScript
Python v2/v3
Ruby
PHP
http://bjorn.tipling.com/if-programming-languages-were-weapons
6/10/15
http://favo.s3.amazonaws.com/if-programming-languages-were-essays.jpg
6/10/15
Pull-down
menus
Programmingbased
Open Source
Commercial
OpenOffice
Google Docs
Spreadsheet
SPSS
Excel
R
Python
Stata
SAS
Matlab
10
11
6/10/15
12
R packages
Spatial
Analysis
Temporal
Analysis
Text Mining
Machine
Learning
http://cran.r-project.org/web/view
s/MachineLearning.html
Demo 1. Software
Installation
https://www.rstudio.com/ide/
More information
http://www.rstudio.com/training/online.html
http://joe11051105.gitbooks.io/r_basic/content
15
Demo1.R basics
R
https://gist.github.com/chengjun/01b61eb2ec1091c4dfae
16
File names
Object names
Organisation
Commenting
guidelines
Syntax
Spacing
Curly braces
Line length
Indentation
Assignment
http://adv-r.had.co.nz/Style.html
17
R script
http://chengjun.github.io/web_data_analysis/demo2_simulate_networks/
install.packages("igraph")
library(igraph)
size = 50
g = graph.tree(size, children = 2); plot(g)
g = graph.star(size); plot(g)
g = graph.full(size); plot(g)
g = graph.ring(size); plot(g)
g = connect.neighborhood(graph.ring(size), 2); plot(g)
g = erdos.renyi.game(size, 0.1)
# small-world network
g = rewire.edges(erdos.renyi.game(size, 0.1), prob = 0.8 ); plot(g)
# scale-free network
g = barabasi.game(size) ; plot(g)
18
19
R script
http://chengjun.github.io/web_data_analysis/demo3_describe_the_network/
Graph Statistics
Centrality Measures
Algorithms of graphs
Shortest path
Connected component algorithms
20
21
R
Getting started
Introduction
Package structure
Package components
Code (R/)
Package metadata (DESCRIPTION)
Object documentation (man/)
Vignettes (vignettes/)
Testing (tests/)
Namespaces (NAMESPACE)
Data (data/)
Compiled code (src/)
Installed files (inst/)
Other components
Best practices
http://r-pkgs.had.co.nz/
22
networkdiffusion
https://github.com/chengjun/networkdiffusion
23
Python
Python
6/10/15
24
Python
Python /pan/
TIOBE 2010
6/10/15
25
Python
R MATLAB Python
Python
Python
list tuple
dictionary
Beginning
Python Hetland, 2005)
6/10/15
26
Python
Python
Python
OpenCV
Python
NumPy SciPy matplotlib
igraph, networkx, graphtool, Snap.py
6/10/15
27
Python IDE
6/10/15
28
Winpython
Winpython http
://sourceforge.net/projects/winpython/
easy_install pip install
Spyder
6/10/15
easy_install beautifulsoup4
29
Spyder on Mac
Installing on Mac OS X
6/10/15
http://continuum.io/downloads.html
Need to install python first
https://bitbucket.org/spyder-ide/spyderlib/downloads
30
http://lingfeiw.gitbooks.io/data-mining-in-social-science/content/python_for_data_analysis/README.html
6/10/15
31
Variable Type
str(3)
int('5')
float('7.1')
Data Structure
dir
6/10/15
dir(str)
dir(list)
dir(tuple)
dir(dict)
l = [1,2,3,3]
t = (1, 2, 3, 3)
s = set([1,2,3,3])
d = {'a':1,'b':2,'c':3}
a = np.array(List)
32
definition
def devidePlus(m, n):
return m/n+ 1
devidePlus(4, 2)
try except
for i in [2, 0, 5]:
try:
print devidePlus(4, i)
except Exception,e:
print e
6/10/15
33
If elif else
x=5
if x < 5:
y = -1
z=5
elif x > 5:
y=1
z = 11
else:
y=0
z = 10
print(x, y, z)
data = []
with open('.../xxx.csv','r') as f:
for line in f:
line = line.strip().split(',')
data.append(line)
f.close()
6/10/15
f = open(".../xxx.txt", "wb")
for i in data:
f.write('\t'.join(map(str,i)) + '\n')
f.close()
34
Python
x = np.random.randn(50)
y = np.random.randn(50) + 3*x
pearsonr(x, y)
def OLSRegressPlot(x,y,col,xlab,ylab):
xx = sm.add_constant(x, prepend=True)
res = sm.OLS(y,xx).fit()
constant, beta = res.params
r2 = res.rsquared
lab = r'$slope = %.2f, \,R^2 = %.2f$' %(beta,r2)
scatter(x,y,s=60,facecolors='none',
edgecolors=col)
plot(x,constant + x*beta,"red",label=lab)
legend(loc = 'upper left',fontsize=16)
xlabel(xlab,size=16)
ylabel(ylab,size=16)
6/10/15
35
Python
fig = plt.figure(figsize=(7, 7),facecolor='white')
data = norm.rvs(10.0, 2.5, size=5000)
mu, std = norm.fit(data)
plt.hist(data, bins=25, normed=True, alpha=0.6, color='g')
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, mu, std)
plt.plot(x, p, 'r', linewidth=2)
title = r"$\mu = %.2f, \, \sigma = %.2f$" % (mu, std)
plt.title(title,size=16)
plt.show()
6/10/15
36
Python
from matplotlib.dates import WeekdayLocator, DayLocator, MONDAY
from matplotlib.finance import quotes_historical_yahoo, candlestick
date1 = (2014, 2, 1)
date2 = (2014, 5, 1)
quotes = quotes_historical_yahoo('INTC', date1, date2)
fig = plt.figure(figsize=(7, 7))
ax = fig.add_subplot(1,1,1)
candlestick(ax, quotes, width=0.8, colorup='green', colordown='r', alpha=0.8)
mondays = WeekdayLocator(MONDAY) # major ticks on the mondays
alldays = DayLocator()
# minor ticks on the days
weekFormatter = DateFormatter('%b %d') # e.g., Jan 12
ax.xaxis.set_major_locator(mondays)
ax.xaxis.set_minor_locator(alldays)
ax.xaxis.set_major_formatter(weekFormatter)
ax.autoscale_view()
plt.setp( plt.gca().get_xticklabels(), rotation=45, horizontalalignment='right')
plt.title(r'$Intel \,Corporation \,Stock \,Price$',size=16)
fig.subplots_adjust(bottom=0.2)
plt.show()
6/10/15
37
Python
importurllib2# urllib2
url=http://www.baidu.com/s?wd=cloga# url
html=urllib2.urlopen(url).read()#
printhtml#
Javascript
API
6/10/15
38
urllib2 beautifulsoup
urllib2 beautifulsoup
url
url
http://bbs.tianya.cn/list.jsp?item=free&nextid=0&order=8&k=
http://bbs.tianya.cn/list.jsp?item=free&nextid=1&order=8&k=
39
urllib2 beautifulsoup
selenium
javascript
10
1000
selenium javascript
html
41
selenium html
find_element_by_id
find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector
6/10/15
find_elements_by_name
find_elements_by_xpath
find_elements_by_link_text
find_elements_by_partial_link_te
xt
find_elements_by_tag_name
find_elements_by_class_name
find_elements_by_css_selector
42
Selenium
from bs4 import BeautifulSoup
from selenium import webdriver
import selenium.webdriver.support.ui as ui
import os
# set work directory
os.chdir('/Users/chengjun/ /Computational
Communication/Data/')
# open the browser
browser = webdriver.Firefox() # Firefox
#wait = ui.WebDriverWait(browser,10) # 10
browser.get("http://xwb100.cn/search.php") #
browser.get(http://xwb100.cn/login/login.php) #
browser
.get
6/10/15
("http://xwb100.cn/weixin3/search1.php")
43
def crawler(page_num, file_name):
try:
# click the javascript button
page_location = "//a[@href='javascript:nextpage_dosubmit(%d)']" %page_num
browser.find_element_by_xpath(page_location).click()
# parse the html
soup = BeautifulSoup(browser.page_source)
articles = soup.find_all('tr')[1:]
# write down info
for i in articles:
td = i.find_all('td')
title = td[1].text
link = td[1].a['href']
record = title+ '\t' + link
with open(file_name,'a') as p: # '''Note''' ppend mode, run only once!
p.write(record.encode('utf-8')+"\n") ##!!encode here to utf-8 to avoid encoding error.
except:
pass
6/10/15
44
# query function
def search_engine(query_word):
query = browser.find_element_by_xpath("//input[@name='keyword']")
query.clear()
query.send_keys(query_word) #
browser.find_element_by_link_text(u' ').click() #
# crawl ranks for a keyword
search_engine(u' ') ##windows users must start with u!!
for page_num in range(1,11):
print page_num
crawler(page_num, 'xwb100_tiger.txt')
6/10/15
45
URL
import urllib2
url =
"http://mp.weixin.qq.com/s?__biz=MzA3MjQ5MTE3OA==&mid=206241627&idx=1&sn=471e59c6cf7c8dae452245dbea22c8f3&3rd=MzA3MDU4NTYzMw==&scene=6#rd"
6/10/15
46
API
APP API
API
SDK(Software Development Kit
SDK
http://open.weibo.com/wiki/SDK
SDK
Python Python SDK
sinaweibopy sinaweibopy Python
sinaweibopy
easy_install sinaweibopy
https://pypi.python.org/pypi/sinaweibopy/1.1.3
6/10/15
47
app
APP_KEY APP_SECRET
OAuth API
6/10/15
1.
2.
3. ACCESS
TOKEN
4.
Python
48
Oauth
OAUTH2.0
Facebook, Twitter,
Sina Weibo
http://www.rfcreader.com/#rfc6749
6/10/15
49
def weiboClient()
# # 615 [ ]
19 615 526 76 1
12 114 109 5 3143
http://computational-communication.com/post/bian-cheng-gong-ju/2015-04-27-weibo-api-python
6/10/15
50
6/10/15
51
6/10/15
52
# URL code:
code = your.web.framework.request.get('code')
client = APIClient(app_key=APP_KEY, app_secret=APP_SECRET,
redirect_uri=CALLBACK_URL)
r = client.request_access_token(code)
access_token = r.access_token # token abc123xyz456
expires_in = r.expires_in # token UNIX
# TODO: access token
client.set_access_token(access_token, expires_in)
print client.statuses.user_timeline.get()
print client.statuses.update.post(status=u' OAuth 2.0 ')
6/10/15
53
HTML
6/10/15
54
Element Locators
id = id
id locators HTML id
name = name
name locators HTML name
identifier = id
identifier locators HTML id
name
Element Locators
Element Locators
xpath=xpathExpression
xpath locator XPath HTML
, "//"
Element Locators
link=textPattern
link locator link HTML
: link=The link text
locator "document."
dom locator "//"
xpath locator, identifier
locator
Element Locators-xpath
XPath
XPath
XPath
XPath
XPath
XML
XML
XSLT
W3C
Element Locators-xpath
a)
b)
c)
d)
e)
f)
nodename
/ ( )
//
.
..
@
Element Locators-xpath
<?xml version="1.0" encoding="ISO-8859-1"?>
<tools>
<tool name=RFT>
<use name=function test>
<free>no!</free>
</use>
<free>no</free>
</tool>
<tool name=loadrunner>
<use name=performance test>
<free>no!</free>
</use>
<free>no</free>
</tool>
Element Locators-xpath
<tool name=selenium>
<use name=function tester>
<free>yes!</free>
</use>
<free>yes</free>
</tool>
<tool id=jmeter>
<use name=performance test></use>
<free>yes</free>
<\tool>
</tools>
Element Locators-xpath
tools
/tools/*
//*
free
//free
free tool use
Element Locators-xpath
tool free
//tools/tool/free
tools tool
//tools/tool[1]
tools tool
//tools/tool[last()]
free no tool
//tools/tool[free=no]
Element Locators-xpath
name tool
//tool[@name]
name selenium tool
//tool[@name=selenium]
Firebug+xpath checker
Firefox
Firebug+xpath checker
firebug
firebug Firefox
xpath checker
Firebug+xpath checker
firebug
F12
firebug
Firebug+xpath checker
Firebug+xpath checker
View Xpath
Firebug+xpath checker
firebug
xpath
xpath checker
xpath
Github
6/10/15
72
6/10/15
73
6/10/15
74
Stackoverflow
6/10/15
75
Kaggle
6/10/15
76