Professional Documents
Culture Documents
Topics
Introduction to frontline data warehouses PetaByte warehouse design with PostgreSQL Aster contributions to PostgreSQL Q&A
Petabytes source data 24 x7 Availability TB/Day Load capacity In-database transforms Rapid data access
Petabytes detailed data Low cost/TB Flexible compression On-line access Aging out to off-line
#$$$%
& &"!H"!
EFG 77)1 H I IB BP P7Q 7Q I 9 I 3C) D 5Q1 3C) 9D 5Q1 '()01 234530 67(8) 9 '()01 @ABC D 01
Topics
Introduction to frontline data warehouses PetaByte warehouse design with PostgreSQL Aster contributions to PostgreSQL Q&A
uy yyv tuwxvwtuy y u t uwu sttwuvy xwy ! "#$# %&#'( ) "#0"#1( #02)1)" ( #301#))" 4 5"# 01#))" $ $11 1 $ ) #1 6)7 1 $ "7# 8"#9# @)A 6 $11 A)1 BCD "# E29 F$# G1#))" 7) $11 ) " 100#"0#)1 HI P)#)Q 7 HI $ $11 1#" 8"#9#
Worker Node
Queries Queries
Data Data
Loader Node
"
2 !5D QP #
7808
7808
7808
7808
A: Users send queries to the Queen (via ACT, ODBC/JDBC, BI tools, etc.) B: The Queen parses the query, creates an optimal distributed plan, sends subqueries to the Workers, and supervises the processing C: Workers execute their subqueries (locally or via distributed communication) D: The Queen aggregates Worker results and returns the final result to the user
11
Query
Logical Database (eg. Schema) cdefghiip usvw cdefghiip usvw cdefghii cdefghii d qhrste p cdefghiip usvw cdefghiip usvw xdy p cdefghiihr cdefghiip usvw cdefghiip usvw cdefghii p cdefghiip usvw cdefghiip usvw rs t q h e cdefghiip usvw cdefghiip usvw cdefghii p xdy cdefghiip usvw cdefghiip usvw Physical Tablespaces
How It Works
%&'& () 012345))56 7 65012345))56 8591@ 91A(0&9 6&'&8&)5 B'1456 (C 6(DD545C' '&895)3&05) 6535C6(CA 1C 012345))(1C 95E59 F(AG 012345))(1C 4&'(1) H8(AA54 8910I )(P5) Q(596 2145 01)' )&E(CA)R %&'&8&)5 '4&C)3&45C' H5&)5 1D DS'S45 T1)'A45) S3A4&65)R U1C0S445C0Q HG(AGV354D142&C05 2S9'(V'&895 012345))(1CR T54D142&C05W XY5@V41@` aS54(5) 61Cb' C556 DS99 '&895 65012345))(1C
Architecture Benefits:
12
Older data accessed less frequently Compress to save space and cost Oldest data is compressed the most, recent is compressed the least Compressed tables are fully available for queries (true online archival)
13
SQL/MR Functions
In-Database MapReduce
Slide 15
In-Database MapReduce
Extensible framework (MapReduce + SQL)
Flexible: Map-Reduce expressiveness, languages, polymorphism Performance: Massively parallel, computational push-down Availability: Fault isolation, resource management
Out-of-process executables
Does not use PL/* for custom code execution Can execute Map and Reduce functions in any language that has a runtime on Linux (e.g. Java, Python, C#, C/C++, Ruby, Perl, R, etc) Standard PostgreSQL APIs to send/receive data to executables Fault isolation, security and resource management for arbitrary user code
16
Add Capacity
17
Precision Scaling
When more CPU/memory/capacity are needed, new nodes can be added for scale-out. Precision Scaling uses standard PostgreSQL APIs to migrate vWorker partitions to new nodes either for load balancing (more compute power) or capacity expansion Example: Assume Workers 1/2/3 are 100% CPU-bottlenecked. Incorporation adds a new Worker4 node and migrates over vWorker partitions D/H/L. As a result of loadbalancing, CPU-utilization drops to 75% per node, eliminating hotspots.
18
Replication
!&H#%" %#!#4"% &"!)0" )% "'
H#4## I
vwvwtu ty wy wv v yv tu tu utv ut u v uv t ut wx wvwtu tvvy wuyv v tyy u yyv
txv
!"#) !& !"&1" )% "#)( # "!H#0"$ %uyuv vt tyvy w& t ' u (v ywwu wux& )0 wwvwy vt vx v uy
19
BCD EFGHIP
Q $ Q %
1234537 Q
&Q
1234538 Q Q
! "
123453R Q Q !
# $
123453S Q " Q #
% &
Replication Failover
Automatic, non-disruptive, graceful performance impact
Replication Restoration
Delta-based (fast) and online (non-disruptive)
20
. . .
x vwst sy & x vw sy (wyv wxv t ' 0'00'%' (wy ( w y wu ( 0xu v
!"# !#
$ "
!%"&#
y wv
21
Commodity Hardware 2 TB Building Block Dell, HP, IBM, Intel x86 16 GB Memory 2.4TB of Storage
8 Disks
012 1
U3ba`i
4V5`
22
...
Vi`i
... ....
Heterogeneous HW support enables customers to add cheaper/faster nodes to existing systems for investment protection Mix-n-match different servers as you grow (faster CPUs, more memory, bigger disk drives, different vendors, etc)
23
Topics
Introduction to frontline data warehouses PetaByte warehouse design with PostgreSQL Aster contributions to PostgreSQL Q&A
24
Separate good from malformed tuples Per tuple error information Errors to capture
Type mismatch (e.g. text vs. int) Check constraints (e.g. int < 0) Malformed chars (e.g. invalid UTF-8 seq.) Missing / extra column
25
26
27
Auto-partitioning in COPY
COPY into a parent table route tuples directly in the child table with matching constraints
28
Auto-partitioning in loading
Activated on-demand
set tuple_routing_in_copy = 0/1;
COPY performance into parent table similar to direct COPY in child table if data is sorted Will leverage partitioning information in the future (WIP for 8.5)
29
Other contributions
Temporary tables and 2PC transactions [Auto-]Partitioning infrastructure Regression test suite LFI (http://lfi.sourceforge.net/)
Fault injection at the library level or below out-of-memory conditions, network connection errors, interrupted system calls, data corruption, hardware failures, etc Lightning talk on Friday!
30
Topics
Introduction to Aster and data warehousing PetaByte warehouse design with PostgreSQL Aster contributions to PostgreSQL Q&A
31
32
Contact us
hello@asterdata.com
33
Bonus slides
34
nCluster Components
TWTW! S gVVUb pqaq hiV!iq"b
$"#$%$"#
(08 9
I G
7 8
9 @
3C D
"
D 573 C 480AC8C
12
D 2 5 A78
&'(
) 2G 0
nCluster
35
Aster Loader
Vq5 SqUqWBTW!
CDEFFGHIGP QRSTU VRWR XUWX YSURWU R VRWR WSR`XaUS bUSacSdR`YU ecWWfU`UYgh ipFqrspHP tufWvbfU QcRVUSX RSU dRbbUV Wc wxcSgUSX c` xcSgUS `cVUXh yaWUS bRSWvWvc`v`T QcRVUS vff bRSRffUf fcRV u`vuU VRWR v`Wc wxcSg USX h GHGsrP Qv`URSf XYRfRefU fcRV bUSacSdR`YU aScd QcRVUSX Wc xcSg USX
Scalable Partitioning
CDEFFGHIGP RSWvWvc`v`T duXW eU XYRfRefU R`V `cW vdbUVU WU fcRVv`T bScYUXX ipFqrspHP RY QcRVUS Y c`WRv`X R RSWvWvc`US vY uXUX R` RfTcSvWd Wc RXX vT` VRWR v`Wc euY gUWXh RY euYg UW vX u`vuUf dRbbUV Wc vWv` WU Y fuX WUSh GHGsrP RXW v`WUffvTU`W bRSWvWvc`v`T VuSv`T dRXXvwUXYRfU VRWR fcRVX
q3Ua
qW
5UTW!
CDEFFGHIGP RvfUV `cVUX dR fcXU VRWR cS XvT`vavYR`Wf VScb fcRV bUSacSdR`YUh ipFqrspHP a R `cVU aRvfX cWUS X WSURdX Yc`Wv`uU Wc fcRV d bUSacSdR`YU vWh yVdv`X YR` URXvf SUYcwUS fcXW VRWR e SUfcRVv`T eufg UUVUS VRWRh GHGsrP fRWR fcXX bScWUY Wvc` g bUSacSdR`YU Yc`XvX WU`Y VuSv`T `cVU aRvfuSUh
36