Professional Documents
Culture Documents
Michael Stonebraker
In My Opinion.
Column stores will win
Factor of 50 or so faster than row stores
t1
t2
t3
t4
t5
t6
t7
t2000
S1
S2
S4000
Hourly data?
All securities?
7
Array Answer
Ignoring the (1/N) and subtracting off the
means .
Stock * StockT
DBMS Requirements
Complex analytics
Covariance is just the start
Defined on arrays
Data management
Leave out outliers
Just on securities with a market cap over
$10B
In My Opinion.
The focus will shift quickly from small math to
big math in many domains
I.e. this stuff will become main stream.
11
Solution Options
R, SAS, MATLAB, et. al.
Weak or non-existent data management
File system storage
12
Solution Options
RDBMS alone
13
Solution Options
R + RDBMS
14
Solution Options
Hadoop
Analytics * .01
Data management * .01
Because
No state
No sticky computation
No point-to-point messaging
Only viable if you dont care about performance
15
Solution Options
16
Big Velocity
Trading volumes going through the roof on
Wall Street breaking infrastructure
Sensor tagging of {cars, people, } creates a
firehose to ingest
The web empowers end users to submit
transactions sending volume through the
roof
PDAs lets them submit transactions from
anywhere.
18
19
20
My Suspicion
Your have 3-4 Big state - little pattern
problems for every one Big pattern little
state problem
21
Solution Choices
Old SQL
The elephants
No SQL
75 or so vendors giving up both SQL and ACID
New SQL
Retain SQL and ACID but go fast with a new
architecture
22
23
No SQL
Give up SQL
Interesting to note that
Cassandra and Mongo are
moving to (yup) SQL
Give up ACID
If you need ACID, this is a
decision to tear your hair out
by doing it in user code
Can you guarantee you wont
need ACID tomorrow?
24
Open source
Shared nothing, Linux, TCP/IP on jelly beans
Light-weight transactions
Run-to-completion with no locking
Single-threaded
Multi-core by splitting main memory
About 100x RDBMS on TPC-C
25
In My Opinion
ACID is good
High level languages are good
Standards (i.e. SQL) are good
26
Big Variety
Typical enterprise has 5000 operational systems
Only a few get into the data warehouse
What about the rest?
And what about all the rest of your data?
Spreadsheets
Access data bases
Web pages
And public data from the web?
27
enterprise
data warehouse
text
28
Summary
The rest of your data (public and private)
Is a treasure trove of incredibly valuable
information
Largely untapped
29
Data Tamer
Goal: integrate the rest of your data
Has to
Be scalable to 1000s of sites
Deal with incomplete, conflicting, and incorrect data
Be incremental
Task is never done
30
31
Data Tamer
MIT research project
Looking for more integration problems
Wanna partner?
32
Take away
One size does not fit all
Plan on (say) 6 DBMS architectures
Use the right tool for the job
Elephants are not competitive
At anything
Have a bad innovators dilemma problem
33