You are on page 1of 31

Infranet: Circumventing Web Censorship

and Surveillance

Nick Feamster, Magdalena Balazinska,


Greg Harfst, Hari Balakrishnan, David Karger
M.I.T. Laboratory for Computer Science

http://nms.lcs.mit.edu/infranet/
The Big Picture

htt
p://
ww Internet
w.c
nn.
com
CENSOR
http://nms
.lcs.mit.ed
u

nms.lcs.mit.edu cnn.com
The Big Picture

htt
p://
ww Internet
w.c
nn.
com
CENSOR
http://nms
.lcs.mit.ed
u
s . m i t . e d u"
p : / / n m s .lc
"ht t m ) nms.lcs.mit.edu cnn.com
. c o
w w . cnn
h t t p : //w
(
How Infranet Works

http://nms.lcs.mit.edu/ http://
http://nms.lcs.mit.edu/papers/ http://www.
http://nms.lcs.mit.edu/papers/usenix.html http://www.cnn.com
CENSOR

nms.lcs.mit.edu cnn.com
<html><title>cnn.com

</title> ...

Use Infranet requester proxy (on localhost)


Upstream request in sequence of HTTP requests
Downstream response in images
Censor

Restrictive government, corporate firewall, etc.

Discovery Attacks: Notice unusual-looking Web traffic.


monitors Web access for "inappropriate use"
watch Web traffic for inappropriate access attempts
watch for suspicious looking Web access patterns
watch for use of circumvention software
watch for suspicious inter-request times

Disruptive Attacks: Keep the endpoints from talking.


blocks access to certain Web sites
attempts to block access to circumvention software (e.g., blocking
SSL, disrupting communication, etc.)
Design Goals

Deniability for clients


Can’t confirm that a client is intentionally retrieving censored data

Statistical deniability for clients


Web traffic doesn’t look unusual

Covertness for servers


Can’t discover a server that is serving censored content
Defense against blocking

Communication Robustness
Should be difficult to disrupt request/transfer of censored content

Reasonable Performance
Related Systems: Triangle Boy, Peekabooty, etc.

Deniability for clients


Existing systems rely on SSL, vulnerable to fingerprinting

Statistical deniability for clients


SSL traffic looks suspicious
No attempt to conceal suspicious traffic patterns

Covertness for servers


Servers make no attempt to conceal their existence
Suspicious traffic patterns may result in discovery and blocking

Communication Robustness
SSL can be blocked (e.g., unsigned server certificates)
Downstream Communication ("Downloading")

Requester Censor Responder


http://nm
s.lcs.mit.e
du/people
.html

http://nms.lc
s.mit.edu/nic
k.jpg Embed
<html><title>cnn.

http://nms.lc
s.mit.edu/m
agda.jpg
Embed
.com</title>...

Embed data in images, recover by shared secret


Steganography is not ideal: can’t reuse cover image
Web cams are wonderful.
Upstream Communication ("Requesting")

Requester Censor Responder


http://n
ms.lcs.m
it.edu/
Decode
http:// http://
nms.lc
s.mit.e
du/pap
ers.htm
l
Decode
http:// http:// www.
nms.lc
s.mit.e
du/use
nix.htm
l Decode
http://www. cnn.com

Hidden message => sequence of HTTP requests


Mapping function: secret, critical to deniability
Simple Schemes: Covertness/Bandwidth

Odd/Even Links
Covertness: Requester may ask for any one of half of the links at
any given time
Bandwidth: 1-bit per visible HTTP request

Links modulo k
Covertness: Requester asks for any of N/k links
Bandwidth: lg(k) bits per visible HTTP request

Static Mapping
Covertness: potentially quite bad...
Bandwidth: M bits per request
Range-Mapping: Web Surfing, 20 Questions-Style
Assume: Some set of censored URLs are commonly requested
Responder tells requester
the boundaries (split-strings) for ranges in this set, and
the mapping between visible HTTP requests and split-strings
Requester tells responder
a visible HTTP request
Visible Requests Split−strings Visible Requests Split−strings
0% 25%

http://nms.lcs.mit.edu/projects.html http://www.amazon.com/ http://nms.lcs.mit.edu/nick.html http://www.cnn.com/

http://nms.lcs.mit.edu/people.html http://www.microsoft.com http://nms.lcs.mit.edu/magda.html http://www.ebay.com

http://nms.lcs.mit.edu/publications.html http://www.yahoo.com/ http://nms.lcs.mit.edu/greg.html http://www.java.com/

http://nms.lcs.mit.edu/history.html http://nms.lcs.mit.edu/hari.html http://www.microsoft.com/


100%
50%

but...not all requests are equally likely!


Getting Statistical Deniability

Divide the corpus according to more likely visible HTTP


requests.
Arithmetic coding says that our expected number of
requests is the same!

Visible Requests Split−strings


0%
http://nms.lcs.mit.edu/projects.html http://www.abc.com/

http://nms.lcs.mit.edu/people.html http://www.microsoft.com

http://nms.lcs.mit.edu/publications.html http://www.zdnet.com/
http://nms.lcs.mit.edu/history.html
100%
Range-Mapping

Search through set of frequently-requested censored


URLs achieves good upstream bandwidth.

Division of ranges according to conditional request


probabilities achieves deniability and covertness.

Idea can be applied over the space of all strings.


Statistical Deniability is Free

1
0.9
Fraction of all hidden URLs

0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1 Without Deniability (mean: 5.9)
With Deniability (mean: 6.4)
0
0 1 2 3 4 5 6 7 8 9 10
Number of requests to transmit hidden URL
Server Covertness is Not Free

1
No Infranet
Infranet
Fraction of Requests

0.8

0.6

0.4

0.2

0
10 100 1000 10000
Bandwidth (kB/s)
Security Analysis

Discovery Attacks
Responders: discover and block access
Requesters: discover and...
Channel: reveal channel and its contents

Disruptive Attacks
Block access to responders
Break tunnel protocol
Remove or alter:
hidden content
hidden requests
Discovery Attacks

Intent: reveal requester or responder

Passive attacks to detect unusual


browsing patterns (including request timing)
HTTP requests/responses
Defense: Innocuous browsing patterns, no modification to HTTP
headers themselves

Impersonation attacks
censor establishes an Infranet responder and nails requesters
Defense: separate forwarding and decoding of symbols
Disruptive Attacks

Intent: disrupt communication between requester and


responder

Filtering (IP, DNS, port, etc.)


Defense: server covertness--never return the different content for
the same URL
Request Tampering
reorder/remove fields from HTTP headers
alter downstream content
Defense: don’t hide messages in HTTP headers, no cookies, use
hidden watermarks, etc.
Session Tampering
insert/remove/reorder HTTP requests and responses
Defense: error correction, headers to "echo" message to
responder, etc.
Conclusion

Infranet hides censored requests and responses in


innocuous-looking HTTP request/response streams
client deniability
server covertness
reasonable robustness

Future work
robustness
software distribution
server discovery

http://nms.lcs.mit.edu/infranet/
Communication Protocol Overview

Message exchange
Requests and responses are hidden in visible HTTP traffic
Requires a hiding function
Messages exchanged as a stream of symbols

Symbol construction
How symbols map to message fragments
A group of symbols is called an alphabet
Upstream: set of URLs on a responder’s Web site
Downstream: high-frequency components in images

Modulation
Specifies a particular function for symbol construction
Maps a sequence of visible HTTP requests to a hidden message
Hiding

Requires:
a message to hide,
a cover medium, and
a secret that reveals the hidden message
Without knowledge of secret, can’t reveal message (or
even prove that a message is being hidden)
Secret exchange...
Tunnel Protocol
Joint Infranet Infranet
Censor
FSM Requester Responder
HTTP Reque
st("/")

index.html)
HTTP Response(
Set User
ID

Tunnel Setup
Stream, ike y)
Initialize H down ( U init,HTTP Resp
Modulation
Function

Exchange Hup(EPKRes (skey)


p , HTTP R eq Stream, U )
Key init

TTP Resp Stre am, skey)


Update H down ( U tunnel, H
Modulation
Function

Transmit Hup(hidden req. ,


HTTP Req Stream
, Utunne )

Steady state
Request l

ey)
Resp Stream, sk
H down (content , HTTP
Transmit
Response

Requester learns upstream modulation function.


Responder learns secret key for downstream hiding.
Upstream Modulation

Infranet Infranet
Censor
Requester Responder

Up(msg frag 1) = url 1 HTTP Reque


st( url )
1

U -1p(url1) = msg frag 1


se
HTTP Respon
.
.
.
Up(msg frag n) = url n HTTP Reque
st( url )
n

U -1p(urln) = msg frag n


se
HTTP Respon

Hidden message => sequence of HTTP requests


Tunnel modulation function is secret
Choice of modulation function is important!
Range-Mapping

C
1
vmax

...

δ v2

δ * P(r2|rcurrent)

v1
δ * P(r1|rcurrent)
vmin

0
Lexicographically
ordered strings
stringmin s1 s2 ... stringmax
More Visible Links => Greater Upstream Bandwidth

20
90th percentile
50th percentile

15
Number of requests

10

0
1 2 4 8 16 32 64 128
Number of possible next requests at each page
Downstream Performance

60

55

50
Number of requests to retrieve data

45

40

35

30

25

20

15

10

5
0 10000 20000 30000 40000 50000 60000 70000 80000
Size of requested data (bytes)
Arithmetic Coding

F(x)
0 1
F’(x1)

0 1 0 1

x
{
x1
F(x)

Encode: Send F’(x)


Decode: Use decision tree (is F’(x) > P(0)?)
With a larger range in F(x), F’(x) requires fewer bits to
express the range.
Arithmetic Coding

F(x)
Codeword Length

First Request
F’(x1)
yahoo
Second Request

amazon cnn
{
x1
F(x)

More popular requests require fewer codewords


On average, number of requests will be the binary
entropy divided by the number of bits per request
"Skewing" how we divide the distribution: selecting
codewords in the distribution.
Steganography

Two stage process


Determine "redundant" bits
Select which of these bits to embed the encrypted message
(determined by PRG, initialized with seed)

Set state based on initial seed, embed a secondary seed


that selects a PRG that makes minimal modifications

Defend against statistical attacks that look at distribution


of DCT coefficients, etc.
RSA Public Key Cryptosystem

Sele t p, q prime.

n = pq
(n) = (p 
1) (q 1)

Sele t an (e; d) su h that ed  1mod n. Then:

C  M e mod n
M  C d mod n
 M ed mod n
 M 1+k(p 1)(q 1) mod n
 M  1k(q 1) mod p (Euler)
 M  1k(p 1) mod q
 M mod n (CRT)

You might also like