Professional Documents
Culture Documents
Monitoring on Smartphones
ed. internal
Interpreted
(8) (8) ings of all contained data. The parcel is passed transpar-
(8)
(8)
VM
ll VM(more
TaintSource
Source
(more Taint
Taint
Taint Source
Source
(1) Trusted Library
(1)
(1)
Trusted Library
Trusted
Trusted Library Library
Taint Sink TaintSink
Taint ently through the kernel (5) and received by the remote
Taint Sink
Sink
ll
all
call bridge]
bridge] JNIHook Hook
Binder Hook
(9)
(9) (9) (11)
untrusted
(11) (11)
application. Note that only the interpreted code
JNI
JNI Hook
Binder
Binder Hook
Hook (7)Binder
Binder
Binder Hook
Hook Hook JNI JNIHook JNI Hook
Hook
: internal
ds:
ds: internal
internal
(2)
(3) (9)
is untrusted. The modified binder library retrieves the
taint(10)tag from the parcel and assigns it to all values read
Userspace
(3)
Userspace
(7)
(3)(3) (4) (7)
MM methods
methods
Userspace
(7)
M methods
(6) (10)
Userspace
ssing
passing Java
Java (5)
ified as a taint sink (8), e.g., network send, the library
Kernel
s, which is
Kernel
Kernel
Binder Kernel Module retrieves the taint tag for the data in question (9) and re-
es,which
which isis Binder
Binder Kernel Kernel
Module Module
vik. Internal ports the event.
k.
vik.Internal
nually Internal
parse Figure 2: TaintDroid Figure 2.architecture TaintDroidwithin Architecture
Android.
Figure
Figure 2. 2.TaintDroid
TaintDroid Architecture
Architecture Implementing this architecture requires addressing
ally
nually
ated byparse
parse
the several system challenges, including: a) taint tag stor-
ed
ated bythe
by the included in our trusted computing base (see Section 2). age, b) interpreted code taint propagation, c) native code
requent use Dalvik
Android VMcontains interpreter, two types storing the specified
of native methods:taint inter- marking(s) taint propagation, d) IPC taint propagation, and e) sec-
quent
requentuse
methods are
use Dalvik
in
nalthe
Dalvik VM
VMvirtual
VMmethods interpreter,
taint andmap.
interpreter, JNIstoringstoring
As
methods. the
thethetrusted Thespecified
specified application
internal taintVM taint uses
marking(s) marking(s)
the storage taint propagation. The remainder of this
ondary
ethods
unctionality
methods are are in the
tainted virtual
information, taint map.
the As
Dalvik the
in the virtual taint map. As the trusted application uses the describes
methods access interpreter-specific VM trusted
structures propagates
andapplication
APIs. taint uses
tags
section the our design.
JNI
(3) methods
according
tainted conform
information, to our the todataJava native
flow
theDalvik Dalvik interface
rules. When standards
the trusted ap- tags
rvices. An-
nctionality
unctionality tainted information, VMVM propagates
propagates taint tainttags
4.1 Taint Tag Storage
specifications
plication uses [32],
the which requires
tainted information Dalvik in to
an separate
IPC transaction,
of JavaAn-
vices.
ervices. [12]
An- (3)
(3) according
according
Javamodified
argumentsbinder to to our our
into variablesdata data flow flow rules.
usingensures
rules. When
a JNI call
When the
bridge.
the
trusted trusted
ap- ap-
the library information
(4) the parcel message The choice of how to store taint tags influences per-
Portions
Java [12]
of Java [12]
of plication
plication
Conversely, uses
uses the
the tainted
internal tainted
VM methods information in
in an IPCparse
must manually an IPC transaction,
transaction, formance and memory overhead. Dynamic taint track-
em libraries carries a taint tag reflecting the combined taint markings
ortions
Portionsofof the
the modified
modified
arguments frombinder binder
the interpreters library
library (4)
byte(4) ensures
ensures
array ofthe the parcel
parcel
arguments. messageingmessage
systems commonly store tags for every data byte or
he Android of all contained data. The parcel is passed transparently
carries
carries
Binder aIPC: a taint
taint Alltag tag reflecting
reflecting
Android the the combined
combined taint
taint markings markings
word [57, 7]. Tracked memory is unstructured and with-
memlibraries
f JNI. libraries
Fur- through the kernel (5) andIPC occurs
received through binder.
by the remote untrusted
out content semantics. Frequently taint tags are stored
enterfaces
heAndroid
Android of
of all is
Binder
all contained
a component-based
contained
application. Notedata. data.The
that theThe
processing
parcelparcel
third-party isand isinterpreted
IPC
passed passed
frame- transparently
transparentlycode is
to work designed for BeOS, extended by Palm in non-adjacent shadow memory [57] and tag maps [61].
JNI.
fkit. JNI. Fur-
Fur- through
through
untrusted. the
the The kernel
kernel modified(5)(5)and and
binder received
received library by the by Inc.,
retrieves
and
the remote
remote the untrusted
taint untrusted
tag
TaintDroid uses variable semantics within the Dalvik in-
Finally, customized for Android by Google. Fundamental to
erfaces
nterfaces toto application.
application.
from the parcel Note
Note and thatthat the
assigns the it third-party
third-party
to all values interpreted
interpreted read code
from code
is
the
terpreter. isWe store taint tags adjacent to variables in
lkit (NDK) binder are parcels, which serialize both active and stan-
t.
kit. Finally,
Finally,
implement untrusted.
untrusted.
parcel (6). The
The
The modified
modified
remote Dalvik
dard data objects. The former includes references to binder
binder library
VM library retrieves
instance retrieves the
propagates the
taint taint
tag
taint
memory, tag
providing spatial locality.
kit (NDK)
pplications.
olkit (NDK) from
tags
from
binder the
(7)
the parcel
identically
parcel
objects, whichand and for assigns
assigns
allows the the untrustedtoit all
it framework to application.
all tovalues
values read read
manage from from
When the the
Dalvik
the has five variable types that require taint stor-
untrusted
parcel (6).application
shared (6).
parcel data Theremote
objects
The remote
between invokesDalvik
Dalvik processes.
VMa library
VM specified
A instance
instancebinder kernelpropagates
propagates age:
as a taint method
taint taint local variables, method arguments, class
itimplement
impedes
mplement
module
sink (8),passes
e.g.,parcel sending messages a thedata between
bufferprocesses.overapplication.
the network, static fields,
the the class instance fields, and arrays. In all cases,
on different
plications.
applications. tags
tags (7)
(7) identically
identically for forthe untrusted
untrusted application. Whenwe When
the
library retrieves the taint tag for the dataspecified
in specified
question store a 32-bit bitvector with each variable to encode
tditimpedes
x86. The
impedes 4 TaintDroid
untrusted
untrusted application
application invokes
invokes a library
a library as a(9-11) as taint
taint
the a tainttag, allowing 32 different taint markings.
and makes a policy decision.
nding better
different
on different sink
sink (8), e.g.,
(8),
TaintDroid
At a high
e.g., asending
is sending
level,
realization
TaintDroid
aofdata
a data ourbuffer buffer
multiple
architecture
over
overgranularity the network,
the network,
enables system-
Dalvikthe
the stores method local variables and arguments
ndx86.
x86.The
The library
library retrieves
retrieves
taint tracking approachthe thetaint taint
within tagtag for for
Android. the the data data inuses
in question
TaintDroid question (9-11)
on an (9-11)
internal stack. When an application invokes a
wide tracking by combining
variable-level theexecution taint Mul- tracking,method, IPC a new stack frame is allocated for all local vari-
ing better
iding better and
and makesaatracking
makes policydecision.
policy within
decision. VM interpreter.
taint tracking, native interface
tiple taint markings are stored as one taint tag. When taint tracking, and secondaryables. Method arguments are also passed via the internal
At aa high
storage
applications highexecute
taint level,TaintDroid
level,
tracking. TaintDroid
native methods, architecture
architecture
variable taint enables enables
tags system- system-
m-wide taint stack. Before calling a method, the callee places the ar-
TaintDroid wide
wide tracking
tracking
are patched
Variable-level on by bycombining
return.
taint combining
Finally, execution
tracking taint execution
While taint
tags previous
are taint
tracking,
assigned tracking,
approaches IPC IPC
guments on the top of the stack such that they become
within an taint
such tracking,
taint tracking,
to parcels
as and
Panorama native
native
propagated
[panorama] interface
interface through taint
and taint
binder. tracking,
tracking,
TaintBochs Note that and secondary
and[taintbochs]
secondary high numbered registers in the callees stack frame. We
wide taint the Technical
storage
storage
provide taint
taint Report
tracking.
tracking.
high-accuracy [17] version
taint tracking of this paper via contains
instruction-level allocate taint tag storage by doubling the size of the stack
m-wide taint
more implementation details. frame allocation. Taint tags are interleaved between val-
TaintDroid
tracking
TaintDroidto Variable-level
Variable-level
taint propagation, taint
taint performancetracking
tracking is While
While sacrificed.
previous previousOn theapproaches
approaches other
Figure 2 depicts TaintDroids architecture. Informa- ues such that register vi originally accessed via f p[i] is
pplications.
within
within anan end
such
such of
tion isas
the
astainted
Panorama
Panorama spectrum,
(1) in[panorama]
approaches
a[panorama]
trusted application andand such as PRECIP
TaintBochs
TaintBochs
with
[precip]
[taintbochs]
[taintbochs]
sufficient accessed as f p[2 i] after modification. Note that Dalvik
computing consider
context (e.g., the location provider). The taint inter- trading
provide
provide only high-level
high-accuracy
high-accuracy system
tainttaint calls
tracking
tracking into
via thevia kernel,
instruction-level
instruction-level stores 64-bit variables as two adjacent 32-bit registers on
em applica-
racking toto off
faceaccuracy
taint
taint propagation,
invokes
propagation, afornativeperformance;
performance
method
performance (2) thatthus, isthey
is interfaces
sacrificed. provide
sacrificed.
with Onthe onlyOn nomi-
the theinternal
the
other other stack. While the byte-code interprets these
tracking
oid distribu- nal advantage
Dalvik VM over OSstoring
interpreter, permissions specified (e.g.,
taint those
markings implemented adjacent registers as a single 64-bit value, the interpreter
plications.
applications. end
end of of thethe spectrum,
spectrum,approaches approaches suchsuch as PRECIPas PRECIP [precip] [precip]
int tracking in Android).
in the virtual taint map. The Dalvik VM propagates taint manages these registers as separate values. Therefore,
computing
computing consider
consider
tags
onlyhigh-level
only high-level system
system calls calls
intointo theap-kernel,
the kernel, trading trading
assume all In (3) according towe
TaintDroid, datachoose flow rules a middleas the trusted ground, our modified stack transparently stores and retrieves 64-
variable-
mem applica-
applica- off
off accuracy
accuracy
plication forforperformance;
performance; thus, thus,
they they
provide provide only nomi- only nomi-
n the Dalvik level taintuses the
tracking. tainted information.
TaintDroid Every
is designed interpreter
to taint primitive bit values to and from separate 32-bit registers (at f p[2i]
doiddistribu-
distribu-
native code, nal
nal advantage
advantage
instance simultaneously over
over OS OS permissions
permissions
propagates
type variables (e.g., int, float, etc). Our taint source and taint(e.g., (e.g.,
tags. those
When those
implemented
the implemented
and f p[2 i + 2]). Finally, native method targets require
t tracking
int tracking trusted
in
in application uses the tainted information in an IPC
Android).
Android). a slightly different stack frame organization for reasons
usly modify sink libraries (Section VI) provide an easy interface to set
ssume
assume all
all andIncheck
TaintDroid,
the taintwewe
TaintDroid, choose
choose
markings aprimitive
middle
a middle
on ground,
ground,
types. variable-
variable-
However,
hetheDalvik
nTaintDroid.
Dalvik level
there taint
level taint tracking.
tracking.
are cases TaintDroid
whenTaintDroid
object is designed
is designed
references to taint
to become
must taint primitive
primitive
tainted
cation
ative with
nativecode,
code, to
typeensure
type taint (e.g.,
variables
variables propagation operates
(e.g.,int,int,float,
float, correctly.
etc).etc).
OurOur Applications
taint taint
sourcesource
and and
.usly
The taint
ly modify
modify are
sinkcompiled
sink into the Dalvik
libraries(Section
libraries (Section VI)VI) EXecutable
provide
provide an (DEX)
an easy easy byte-code
interface
interface to set to set
Low Addresses (0x00000000) Interpreted Targets Native Targets
propriate propagation logic. We use a data flow logic, as
stack pointer (top)
out0
tracking implicit flows requires static analysis and causes
VM goop
significant performance overhead and overestimation in
frame pointer (current)
v0 == local0 tracking [29] (see Section 8). We begin by defining taint
v0 taint tag markings, taint tags, variables, and taint propagation. We
out0 v1 == in0 arg0
then present our logic rules for DEX.
out0 taint tag v1 taint tag arg1
out1 v2 == in1 return taint Let L be the universe of taint markings for a particular
out1 taint tag v2 taint tag arg0 taint tag system. A taint tag t is a set of taint markings, t L.
(unused) arg1 taint tag
Each variable has an associated taint tag. A variable is an
VM goop
instance of one of the five types described in Section 4.1.
v0 == local0
frame pointer (previous)
variable We use a different representation for each type. The local
v0 taint tag variable taint tag and argument variables correspond to virtual registers,
v1 == local1
denoted vx . Class field variables are denoted as fx to in-
v1 taint tag
v2 == in0
dicate a field variable with class index x. Instance fields
require an instance object and are denoted vy (fx ), where
v4 taint tag
vy is the instance object reference (note that both the ob-
High Addresses (0xffffffff) ject reference and the dereferenced value are variables).
Figure 3: Modified Stack Format. Taint tags are inter- Static fields are denoted as fx alone, which is shorthand
leaved between registers for interpreted method targets for S(fx ), where S() is the static scope. Finally, vx []
and appended for native methods. Dark grayed boxes denotes an array, where vx is an array object reference
represent taint tags. variable.
discussed in Section 4.3. The modified stack format is Our virtual taint map function is (). (v) returns the
shown in Figure 3. taint tag t for variable v. (v) is also used to assign a
taint tag to a variable. Retrieval and assignment are dis-
Taint tags are stored adjacent to class fields and ar-
tinguished by the position of () w.r.t. the symbol.
rays inside the VM interpreters internal data structures.
When (v) appears on the right hand side of , (v) re-
TaintDroid stores only one taint tag per array to minimize
trieves the taint tag for v. When (v) appears on the left
storage overhead. Per-value taint tag storage is severely
hand side, (v) assigns the taint tag for v. For example,
inefficient for Java String objects, as all characters have
(v1 ) (v2 ) copies the taint tag from v2 to v1 .
the same tag. Unfortunately, storing one taint tag per ar-
ray may result in false positives during taint propagation. Table 1 captures our propagation logic. The table enu-
For example, if untainted variable u is stored into array A merates abstracted versions of the byte-code instructions
at index 0 (A[0]) and tainted variable t is stored into A[1], specified in the DEX documentation. Register variables
then array A is tainted. Later, if variable v is assigned and class fields are referenced by vX and fX , respec-
to A[0], v will be tainted, even though u was untainted. tively. R and E are the return and exception variables
Fortunately, Java frequently uses objects, and object ref- maintained within the interpreter, respectively. A, B, and
erences are infrequently tainted (see Section 4.2), there- C are constants in the byte-code. The table does not list
fore this coding practice leads to less false positives. instructions that clear the taint tag of the destination reg-
ister. For example, we do not consider the array-length
4.2 Interpreted Code Taint Propagation instruction to return a tainted value even if the array is
Taint tracking granularity and flow semantics influ- tainted. Note that the array length is sometimes used to
ence performance and accuracy. TaintDroid implements aid direct control flow propagation (e.g., Vogt et al. [53]).
variable-level taint tracking within the Dalvik VM in- 4.2.2 Tainting Object References
terpreter. Variables provide valuable semantics for taint
propagation, distinguishing data pointers from scalar val- The propagation rules in Table 1 are straightforward
ues. TaintDroid primarily tracks primitive type variables with two exceptions. First, taint propagation logics com-
(e.g., int, float, etc); however, there are cases when object monly include the taint tag of an array index during
references must become tainted to ensure taint propaga- lookup to handle translation tables (e.g., ASCII/UNI-
tion operates correctly; this section addresses why these CODE or character case conversion). For example, con-
cases exist. However, first we present taint tracking in sider a translation table from lowercase to upper case
the Dalvik machine language as a formal logic. characters: if a tainted value a is used as an array index,
the resulting A value should be tainted even though the
4.2.1 Taint Propagation Logic A value in the array is not. Hence, the taint logic for
The Dalvik VM operates on the unique DEX machine aget-op uses both the array and array index taint. Sec-
language instruction set, therefore we must design an ap- ond, when the array contains object references (e.g., an
Table 1: DEX Taint Propagation Logic. Register variables and class fields are referenced by vX and fX , respectively.
R and E are the return and exception variables maintained within the interpreter. A, B, and C are byte-code constants.
Op Format Op Semantics Taint Propagation Description
const-op vA C vA C (vA ) Clear vA taint
move-op vA vB vA vB (vA ) (vB ) Set vA taint to vB taint
move-op-R vA vA R (vA ) (R) Set vA taint to return taint
return-op vA R vA (R) (vA ) Set return taint ( if void)
move-op-E vA vA E (vA ) (E) Set vA taint to exception taint
throw-op vA E vA (E) (vA ) Set exception taint
unary-op vA vB vA vB (vA ) (vB ) Set vA taint to vB taint
binary-op vA vB vC vA v B vC (vA ) (vB ) (vC ) Set vA taint to vB taint vC taint
binary-op vA vB vA v A vB (vA ) (vA ) (vB ) Update vA taint with vB taint
binary-op vA vB C v A vB C (vA ) (vB ) Set vA taint to vB taint
aput-op vA vB vC vB [vC ] vA (vB []) (vB []) (vA ) Update array vB taint with vA taint
aget-op vA vB vC vA vB [vC ] (vA ) (vB []) (vC ) Set vA taint to array and index taint
sput-op vA fB f B vA (fB ) (vA ) Set field fB taint to vA taint
sget-op vA fB vA f B (vA ) (fB ) Set vA taint to field fB taint
iput-op vA vB fC vB (fC ) vA (vB (fC )) (vA ) Set field fC taint to vA taint
iget-op vA vB fC vA vB (fC ) (vA ) (vB (fC )) (vB ) Set vA taint to field fC and object reference taint
public static Integer valueOf(int i) { only by including the object reference taint tag when the
if (i < -128 || i > 127) { value field is read from the Integer (i.e., the iget-op prop-
return new Integer(i); }
agation rule), will the correct taint tag of TAG be assigned
return valueOfCache.CACHE [i+128];
} to out.
static class valueOfCache {
static final Integer[] CACHE = new Integer[256]; 4.3 Native Code Taint Propagation
static {
for(int i=-128; i<=127; i++) { Native code is unmonitored in TaintDroid. Ideally,
CACHE[i+128] = new Integer(i); } } we achieve the same propagation semantics as the in-
} terpreted counterpart. Hence, we define two necessary
postconditions for accurate taint tracking in the Java-
Figure 4: Excerpt from Androids Integer class illustrat-
like environment: 1) all accessed external variables (i.e.,
ing the need for object reference taint propagation.
class fields referenced by other methods) are assigned
taint tags according to data flow rules; and 2) the re-
Integer array), the index taint tag is propagated to the ob- turn value is assigned a taint tag according to data flow
ject reference and not the object value. Therefore, we rules. TaintDroid achieves these postconditions through
include the object reference taint tag in the instance get an assortment of manual instrumentation, heuristics, and
(iget-op) rule. method profiles, depending on situational requirements.
The code listed in Figure 4 demonstrates a real in-
Internal VM Methods: Internal VM methods are called
stance of where object reference tainting is needed. Here,
directly by interpreted code, passing a pointer to an ar-
valueOf() returns an Integer object for a passed int. If the
ray of 32-bit register arguments and a pointer to a return
int argument is between 128 and 127, valueOf() returns
value. The stack augmentation shown in Figure 3 pro-
reference to a statically defined Integer object. valueOf()
vides access to taint tags for both Java arguments and
is implicitly called for conversion to an object. Consider
the return value. As there are a relatively small number
the following definition and use of a method intProxy().
of internal VM methods which are infrequently added
Object intProxy(int val) { return val; } between versions,2 we manually inspected and patched
int out = (Integer) intProxy(tVal); them for taint propagation as needed. We identified 185
Consider the case where tVal is an int with value 1 internal VM methods in Android version 2.1; however,
and taint tag TAG. When intProxy() is passed tVal, TAG only 5 required patching: the System.arraycopy() native
is propagated to val. When intProxy() returns val, it method for copying array contents, and several native
calls Integer.valueOf() to obtain an Integer instance cor- methods implementing Java reflection.
responding to the scalar variable val. In this case, Inte- JNI Methods: JNI methods are invoked through the
ger.valueOf() returns a reference to the static Integer ob- JNI call bridge. The call bridge parses Java arguments
ject with value 1. The value field (of the Integer class) in and assigns a return value using the methods descriptor
the object has taint tag of ; however, since the aget-op string. We patched the call bridge to provide taint propa-
propagation rule includes the taint of the index register, gation for all JNI methods. When a JNI method returns,
the object reference has a taint tag of TAG. Therefore, TaintDroid consults a method profile table for tag propa-
gation updates. A method profile is a list of (f rom, to) However, this additional complexity will negatively im-
pairs indicating flows between variables, which may be pact IPC performance.
method parameters, class variables, or return values.
Enumerating the information flows for all JNI methods 4.5 Secondary Storage Taint Propagation
is a time consuming task best completed automatically Taint tags may be lost when data is written to a file.
using source code analysis (a task we leave for future Our design stores one taint tag per file. The taint tag
work). We currently include an additional propagation is updated on file write and propagated to data on file
heuristic patch. The heuristic is conservative for JNI read. TaintDroid stores file taint tags in the file sys-
methods that only operate on primitive and String ar- tems extended attributes. To do this, we implemented
guments and return values. It assigns the union of the extended attribute support for Androids host file system
method argument taint tags to the taint tag of the return (YAFFS2) and formatted the removable SDcard with the
value. While the heuristic has false negatives for meth- ext2 file system. As with arrays and IPC, storing one
ods using objects, it covers many existing methods. taint tag per file leads to false positives and limits the
We performed a survey of the JNI methods included granularity of taint markings for information databases
in the official Android source code (version 2.1) to de- (see Section 5). Alternatively, we could track taint tags
termine specific properties. We found 2,844 JNI meth- at a finer granularity at the expense of added memory and
ods with a Java interface and C or C++ implementation.3 performance overhead.
Of these methods, 913 did not reference objects (as argu-
ments, return value, or method body) and hence are auto-
4.6 Taint Interface Library
matically covered by our heuristic. The remaining meth- Taint sources and sinks defined within the virtualized
ods may or may not have information flows that produce environment must communicate taint tags with the track-
false negatives. Currently, we define method profiles as ing system. We abstract the taint source and sink logic
needed. For example, methods in the IBM NativeCon- into a single taint interface library. The interface per-
verter class require propagation for conversion between forms two functions: 1) add taint markings to variables;
character and byte arrays. and 2) retrieve taint markings from variables. The library
only provides the ability to add and not set or clear taint
4.4 IPC Taint Propagation tags, as such functionality could be used by untrusted
Taint tags must propagate between applications when Java code to remove taint markings.
they exchange data. The tracking granularity affects Adding taint tags to arrays and strings via internal VM
performance and memory overhead. TaintDroid uses methods is straightforward, as both are stored in data ob-
message-level taint tracking. A message taint tag repre- jects. Primitive type variables, on the other hand, are
sents the upper bound of taint markings assigned to vari- stored on the interpreters internal stack and disappear
ables contained in the message. We use message-level after a method is called. Therefore, the taint library uses
granularity to minimize performance and storage over- the method return value as a means of tainting primitive
head during IPC. type variables. The developer passes a value or variable
We chose to implement message-level over variable- into the appropriate add taint method (e.g., addTaintInt())
level taint propagation, because in a variable-level sys- and the returned variable has the same value but addition-
tem, a devious receiver could game the monitoring by ally has the specified taint tag. Note that the stack storage
unpacking variables in a different way to acquire val- does not pose complications for taint tag retrieval.
ues without taint propagation. For example, if an IPC
parcel message contains a sequence of scalar values, the
5 Privacy Hook Placement
receiver may unpack a string instead, thereby acquiring Using TaintDroid for privacy analysis requires iden-
values without propagating all the taint tags on scalar val- tifying privacy sensitive sources and instrumenting taint
ues in the sequence. Hence, to prevent applications from sources within the operating system. Historically, dy-
removing taint tags in this way, the current implementa- namic taint analysis systems assume taint source and sink
tion protects taint tags at the message-level. placement is trivial. However, complex operating sys-
Message-level taint propagation for IPC leads to false tems such as Android provide applications information
positives. Similar to arrays, all data items in a parcel in a variety of ways, e.g., direct access, and service inter-
share the same taint tag. For example, Section 8 dis- face. Each potential type of privacy sensitive information
cusses limitations for tracking the IMSI that results from must be studied carefully to determine the best method of
passing as portions the value as configuration parameters defining the taint source.
in parcels. Future implementations will consider word- Taint sources can only add taint tags to memory for
level taint tags along with additional consistency checks which TaintDroid provides tag storage. Currently, taint
to ensure accurate propagation for unpacked variables. source and sink placement is limited to variables in in-
terpreted code, IPC messages, and files. This section 6 Application Study
discusses how valuable taint sources and sinks can be im-
This section reports on an application study that uses
plemented within these restrictions. We generalize such
TaintDroid to analyze how 30 popular third-party An-
taint sources based on information characteristics.
droid applications use privacy sensitive user data. Exist-
Low-bandwidth Sensors: A variety of privacy sensitive ing applications acquire a variety of user data along with
information types are acquired through low-bandwidth permissions to access the Internet. Our study finds that
sensors, e.g., location and accelerometer. Such informa- two thirds of these applications expose detailed location
tion often changes frequently and is simultaneously used data, the phones unique ID, and the phone number using
by multiple applications. Therefore, it is common for the combination of the seemingly innocuous access per-
a smartphone OS to multiplex access to low-bandwidth missions granted at install. This finding was made possi-
sensors using a manager. This sensor manager represents ble by TaintDroids ability to monitor runtime access of
an ideal point for taint source hook placement. For our sensitive user data and to precisely relate the monitored
analysis, we placed hooks in Androids LocationMan- accesses with the data exposure by applications.
ager and SensorManager applications.
6.1 Experimental Setup
High-bandwidth Sensors: Privacy sensitive informa- An early 2010 survey of the 50 most popular free ap-
tion sources such as the microphone and camera are plications in each category of the Android Market [2]
high-bandwidth. Each request from the sensor frequently (1,100 applications, in total) revealed that roughly a third
returns a large amount of data that is only used by one of the applications (358 of the 1,100 applications) re-
application. Therefore, the smartphone OS may share quire Internet permissions along with permissions to ac-
sensor information via large data buffers, files, or both. cess either location, camera, or audio data. From this set,
When sensor information is shared via files, the file must we randomly selected 30 popular applications (an 8.4%
be tainted with the appropriate tag. Due to flexible APIs, sample size), which span twelve categories. Table 2 enu-
we placed hooks for both data buffer and file tainting for merates these applications along with permissions they
tracking microphone and camera information. request at install time. Note that this does not reflect ac-
tual access or use of sensitive data.
Information Databases: Shared information such as ad-
We studied each of the thirty downloaded applica-
dress books and SMS messages are often stored in file-
tions by starting the application, performing any initial-
based databases. This organization provides a useful un-
ization or registration that was required, and then man-
ambiguous taint source similar to hardware sensors. By
ually exercising the functionality offered by the appli-
adding a taint tag to such database files, all informa-
cation. We recorded system logs including detailed in-
tion read from the file will be automatically tainted. We
formation from TaintDroid: tainted binder messages,
used this technique for tracking address book informa-
tainted file output, and tainted network messages with
tion. Note that while TaintDroids file-level granularity
the remote address. The overall experiment (conducted
was appropriate for these valuable information sources,
in May 2010) lasted slightly over 100 minutes, generat-
others may exist for which files are too coarse grained.
ing 22,594 packets (8.6MB) and 1,130 TCP connections.
However, we have not yet encountered such sources.
To verify our results, we also logged the network traffic
Device Identifiers: Information that uniquely identifies using tcpdump on the WiFi interface and repeated exper-
the phone or the user is privacy sensitive. Not all per- iments on multiple Nexus One phones, running the same
sonally identifiable information can be easily tainted. version of TaintDroid built on Android 2.1. Though the
However, the phone contains several easily tainted iden- phones used for experiments had a valid SIM card in-
tifiers: the phone number, SIM card identifiers (IMSI, stalled, the SIM card was inactivate, forcing all the pack-
ICC-ID), and device identifier (IMEI) are all accessed ets to be transmitted via the WiFi interface. The packet
through well-defined APIs. We instrumented the APIs trace was used only to verify the exposure of tainted data
for the phone number, ICC-ID, and IMEI. An IMSI taint flagged by TaintDroid.
source has inherent limitations discussed in Section 8. In addition to the network trace, we also noted whether
applications acquired user consent (either explicit or im-
Network Taint Sink: Our privacy analysis identifies plicit) for exporting sensitive information. This provides
when tainted information transmits out the network in- additional context information to identify possible pri-
terface. The VM interpreter-based approach requires the vacy violations. For example, by selecting the use my
taint sink to be placed within interpreted code. Hence, location option in a weather application, the user im-
we instrumented the Java framework libraries at the point plicitly consents to disclosing geographic coordinates to
the native socket library is invoked. the weather server.
Table 2: Applications grouped by the requested permissions (L: location, C: camera, A: audio, P: phone state). Android
Market categories are indicated in parenthesis, showing the diversity of the studied applications.
Permissions
Applications #
L C A P
The Weather Channel (News & Weather); Cestos, Solitaire (Game); Movies (Entertainment); 6 x
Babble (Social); Manga Browser (Comics)
Bump, Wertago (Social); Antivirus (Communication); ABC Animals, Traffic Jam, Hearts, 14 x x
Blackjack, (Games); Horoscope (Lifestyle); Yellow Pages (Reference); 3001 Wisdom Quotes
Lite, Dastelefonbuch, Astrid (Productivity), BBC News Live Stream (News & Weather); Ring-
tones (Entertainment)
Layar (Lifestyle); Knocking (Social); Coupons (Shopping); Trapster (Travel); Spongebob Slide 6 x x x
(Game); ProBasketBall (Sports)
MySpace (Social); Barcode Scanner, ixMAT (Shopping) 3 x
Evernote (Productivity) 1 x x x
Listed names correspond to the name displayed on the phone and not necessarily the name listed in the Android Market.
All listed applications also require access to the Internet.
Table 3: Potential privacy violations by 20 of the studied applications. Note that three applications had multiple
violations, one of which had a violation in all three categories.
Observed Behavior (# of apps) Details
Phone Information to Content Servers (2) 2 apps sent out the phone number, IMSI, and ICC-ID along with the
geo-coordinates to the apps content server.
Device ID to Content Servers (7) 2 Social, 1 Shopping, 1 Reference and three other apps transmitted
the IMEI number to the apps content server.
Location to Advertisement Servers (15) 5 apps sent geo-coordinates to ad.qwapi.com, 5 apps to admob.com,
2 apps to ads.mobclix.com (1 sent location both to admob.com and
ads.mobclix.com) and 4 apps sent location to data.flurry.com.
TaintDroid flagged nine applications in this category, but only seven transmitted the raw IMEI without mentioning such practice in the EULA.
To the best of our knowledge, the binary messages contained tainted location data (see the discussion below).
6.2 Findings ber, (2) the IMSI which is a unique 15-digit code used to
identify an individual user on a GSM network, and (3)
Table 3 summarizes our findings. TaintDroid flagged
the ICC-ID number which is a unique SIM card serial
105 TCP connections as containing tainted privacy sen-
number. We verified messages were flagged correctly by
sitive information. We manually labeled each mes-
inspecting the plaintext payload.4 In neither case was the
sage based on available context, including remote server
user informed that this information was transmitted off
names and temporally relevant application log messages.
the phone.
We used remote hostnames as an indication of whether
data was being sent to a server providing application This finding demonstrates that Androids coarse-
functionality or to a third party. Frequently, messages grained access control provides insufficient protection
contained plaintext that aided categorization, e.g., an against third-party applications seeking to collect sensi-
HTTP GET request containing geographic coordinates. tive data. Moreover, we found that one application trans-
However, 21 flagged messages contained binary data. mits the phone information every time the phone boots.
Our investigation indicates these messages were gen- While this application displays a terms of use on first use,
erated by the Google Maps for Mobile [21] and Flur- the terms of use does not specify collection of this highly
ryAgent [20] APIs and contained tainted privacy sensi- sensitive data. Surprisingly, this application transmits the
tive data. These conclusions are supported by message phone data immediately after install, before first use.
transmissions immediately after the application received Device Unique ID: The devices IMEI was also exposed
a tainted parcel from the system location manager. We by applications. The IMEI uniquely identifies a specific
now expand on our findings for each category and reflect mobile phone and is used to prevent a stolen handset
on potential privacy violations. from accessing the cellular network. TaintDroid flags
Phone Information: Table 2 shows that 21 out of the indicated that nine applications transmitted the IMEI.
30 applications require permissions to read phone state Seven out of the nine applications either do not present
and the Internet. We found that 2 of the 21 applications an End User License Agreement (EULA) or do not spec-
transmitted to their server (1) the devices phone num- ify IMEI collection in the EULA. One of the seven ap-
plications is a popular social networking application and TaintDroid, 37 were deemed clearly legitimate use. The
another is a location-based search application. Further- flags resulted from four applications and the OS itself
more, we found two of the seven applications include the while using the Google Maps for Mobile (GMM) API.
IMEI when transmitting the devices geographic coordi- The TaintDroid logs indicate an HTTP request with the
nates to their content server, potentially repurposing the User-Agent: GMM . . . header, but a binary pay-
IMEI as a client ID. load. Given that GMM functionality includes download-
In comparison, two of the nine applications treat the ing maps based on geographic coordinates, it is obvious
IMEI with more care, thus we do not classify them as that TaintDroid correctly identified location information
potential privacy violators. One application displays a in the payload. Our manual inspection of each message
privacy statement that clearly indicates that the applica- along with the network packet trace confirmed that there
tion collects the device ID. The other uses the hash of were no false positives. We note that there is a possibil-
the IMEI instead of the number itself. We verified this ity of false negatives, which is difficult to verify with the
practice by comparing results from two different phones. lack of the source code of the third-party applications.
Location Data to Advertisement Servers: Half of the Summary: Our study of 30 popular applications shows
studied applications exposed location data to third-party the effectiveness of the TaintDroid system in accu-
advertisement servers without requiring implicit or ex- rately tracking applications use of privacy sensitive data.
plicit user consent. Of the fifteen applications, only two While monitoring these applications, TaintDroid gener-
presented a EULA on first run; however neither EULA ated no false positives (with the exception of the IMSI
indicated this practice. Exposure of location informa- taint source which we disabled for experiments, see Sec-
tion occurred both in plaintext and in binary format. tion 8). The flags raised by TaintDroid helped to identify
The latter highlights TaintDroids advantages over sim- potential privacy violations by the tested applications.
ple pattern-based packet scanning. Applications sent lo- Half of the studied applications share location data with
cation data in plaintext to admob.com, ad.qwapi.com, advertisement servers. Approximately one third of the
ads.mobclix.com (11 applications) and in binary format applications expose the device ID, sometimes with the
to FlurryAgent (4 applications). The plaintext location phone number and the SIM card serial number. The anal-
exposure to AdMob occurred in the HTTP GET string: ysis was simplified by the taint tag provided by Taint-
...&s=a14a4a93f1e4c68&..&t=062A1CB1D476DE85
Droid that precisely describes which privacy relevant
B717D9195A6722A9&d%5Bcoord%5D=47.6612278900 data is included in the payload, especially for binary pay-
00006%2C-122.31589477&... loads. We also note that there was almost no perceived
latency while running experiments with TaintDroid.
Investigating the AdMob SDK revealed the s= parameter
is an identifier unique to an application publisher, and the 7 Performance Evaluation
coord= parameter provides the geographic coordinates. We now study TaintDroids taint tracking overhead.
For FlurryAgent, we confirmed location exposure by Experiments were performed on a Google Nexus One
the following sequence of events. First, a component running Android OS version 2.1 modified for TaintDroid.
named FlurryAgent registers with the location man- Within the interpreted environment, TaintDroid incurs
ager to receive location updates. Then, TaintDroid log the same performance and memory overhead regardless
messages show the application receiving a tainted par- of the existence of taint markings. Hence, we only need
cel from the location manager. Finally, the application to ensure file access includes appropriate taint tags.
reports sending report to http://data.flurry.
com/aar.do after receiving the tainted parcel. 7.1 Macrobenchmarks
Our experimentation indicates these fifteen applica- During the application study, we anecdotally observed
tions collect location data and send it to advertisement limited performance overhead. We hypothesize that this
servers. In some cases, location data was transmitted is because: 1) most applications are primarily in a wait
to advertisement servers even when no advertisement state, and 2) heavyweight operations (e.g., screen up-
was displayed in the application. However, we note that dates and webpage rendering) occur in unmonitored na-
TaintDroid helped us verify that three of the studied ap- tive libraries.
plications (not included in the Table 3) only transmitted To gain further insight into perceived overhead, we
location data per users request to pull localized content devised five macrobenchmarks for common high-level
from their servers. This finding demonstrates the impor- smartphone operations. Each experiment was measured
tance of monitoring exercised functionality of an appli- 50 times and observed 95% confidence intervals at least
cation that reflects how the application actually uses or an order of magnitude less than the mean. In each case,
abuses the granted permissions. we excluded the first run to remove unrelated initializa-
Legitimate Flags: Out of 105 connections flagged by tion costs. Experimental results are shown in Table 4.
2000
Android
Table 4: Macrobenchmark Results 1800 TaintDroid
Android TaintDroid
1600
App Load Time 63 ms 65 ms
600
Application Load Time: The application load time
measures from when Androids Activity Manager re- 400
[49] S LOWINSKA , A., AND B OS , H. Pointless Tainting? Evaluating Section 8, we disabled the IMSI taint source for experiments. Nonethe-
the Practicality of Pointer Tainting. In Proceedings of the Euro- less, TaintDroids flag of the ICC-ID and the phone number led us to
pean Conference on Computer Systems (EuroSys) (April 2009), find the IMSI contained in the same payload.
5 Regardless of the string separation, the MCC and MNC are identi-
pp. 6174.
fiers that warrant taint sources.
[50] S UH , G. E., L EE , J. W., Z HANG , D., AND D EVADAS , S. Se-
cure Program Execution via Dynamic Information Flow Track-
ing. In Proceedings of Architectural Support for Programming
Languages and Operating Systems (2004).
[51] VACHHARAJANI , N., B RIDGES , M. J., C HANG , J., R ANGAN ,
R., OTTONI , G., B LOME , J. A., R EIS , G. A., VACHHARA -
JANI , M., AND AUGUST, D. I. RIFLE: An Architectural Frame-
work for User-Centric Information-Flow Security. In Proceed-
ings of the 37th annual IEEE/ACM International Symposium on
Microarchitecture (2004), pp. 243254.
[52] VANDEBOGART, S., E FSTATHOPOULOS , P., KOHLER , E.,
K ROHN , M., F REY, C., Z IEGLER , D., K AASHOEK , F., M OR -
RIS , R., AND M AZI E` RES , D. Labels and Event Processes in
the Asbestos Operating System. ACM Transactions on Computer
Systems (TOCS) 25, 4 (December 2007).