You are on page 1of 31

Oracle Change Data Capture

Jack Raitto, Development Manager


Oracle NEDC
NYOUG Long Island SIG
October 7, 2004

1
Oracle Corporation
Capture your change data for FREE!*

Change Capture
Cost

Before After
* Zero additional license cost over Oracle10g EE
Virtually zero source system processing cost
2
Oracle Corporation
What is Oracle CDC?
 Captures change data from operational
system(s) as it occurs
 Part of Extract / Transform / Load
(ETL) process for DSS / Data
warehouse, potentially other
applications
 Optimizes the extract phase
 Unleashes SQL power for
transformations
 Provides management framework for
change data
3
Oracle Corporation
How was it done before (old way)?
Method Major Issues

Application logging / Maintenance,


triggers transaction impacts
Timestamp / change Application design &
key column performance impact,
no before image
Table differencing Impractical for large
tables, high transport
costs, not timely
Log sniffing Not supported, does
not track DB releases,
security issues, rocket
science
4
Oracle Corporation
CDC Advantages

• Built in, custom fit, evolves with the database


• Delivers change data when you need it,
where you need it
• Offers several tradeoffs between timely
change delivery vs. source system overhead
(sync, async hotlog, async autolog, etc.)
• Assumes complete change management
responsibility

5
Oracle Corporation
CDC Advantages (concl.)

• Captures all change data along with


transaction information – see all changes a
given transaction made and who made them
• Transactional consistency for changes
across multiple source tables is guaranteed
• Transparently coordinates sharing of change
data across users and applications
• You don’t need rocket scientists on your
staff!

6
Oracle Corporation
CDC Configurations
Sync CDC Async CDC Async CDC
HotLog AutoLog
Available Oracle 9i EE Oracle 10g EE Oracle 10g EE
Oracle 10g SE
Source Transaction System Minimal (~2%)
system cost delay, system resources
resources
Part of txn YES NO NO

Latency Real time Near real time Varies w /


topology,
checkpoint &
log switch
interval
Systems 1 1 2
7
Oracle Corporation
How CDC Works: Sync CDC

 Uses internal triggers to capture


before and/or after images of new and
updated rows
 Has the same performance
implications as capture via user
triggers
 Delivers change data in real-time
 Uses the same interface as async CDC

8
Oracle Corporation
Synchronous CDC HotLog
Combined Source / Operational BI System

CDC
Change Tables ETL Process
Customer Upsert to Load
Dimension
Tables

CDC Order Direct Path


Insert to load
Fact Tables

9
Oracle Corporation
How CDC Works: Async CDC

 Relational interface to Streams


• Prepackaged Streams application
• Asynchronously captures change data
from redo/archive logs
• Presents relational interface to change
data stream
 Can operate on source system (hot
log) or staging system (auto log)

10
Oracle Corporation
Foundations of Async CDC
Change capture
Change management
Warehouse loading
Async CDC
Replication
Message queuing
Warehouse loading
Event notification
Data protection
Streams
Redo log inspection

LogMiner Debugging
Auditing
Reversing transactions

11
Oracle Corporation
Asynchronous CDC HotLog
Combined Source / Operational BI System

CDC
Change Tables ETL Process
Customer Upsert to Load
Dimension
LogMiner Tables
Active Streams
Redo Direct Path
CDC Order
Log Insert to load
Fact Tables

12
Oracle Corporation
Asynchronous CDC AutoLog
Source Data Warehouse / Staging System
Database
CDC
Change Tables ETL Process
Customer Upsert to Load
Dimension
LogMiner Tables

Redo Streams
Logs CDC Order Direct Path
Arch Insert to load
Process Fact Tables
Archived
Redo Logs

13
Oracle Corporation
Using CDC: Publish/Subscribe

 Publisher supplies, subscribers consume


change data
 Model allows sharing of change data across
users and applications
 Coordinates retention / purge of change data
 Prevents application from accidentally
processing change data more than once
 Guarantees transactional consistency of
change data across source tables via change
sets

14
Oracle Corporation
Using CDC: Publish/Subscribe
Subscriber 1

Subscription
Publisher CustNo Last First

123 Smith Frank

Change 124 Jones Mary


Data Publication 125 Stein Linda
Table Column Type
Cust CustNo number
Cust Last varchar
CustNo Last First
Cust First varchar Subscriber 2
123 Smith Frank

124 Jones Mary

125 Stein Linda Subscription


CustNo Last First
126 Vine Abe
125 Stein Linda
127 Block Greg
126 Vine Abe

127 Block Greg

15
Oracle Corporation
Publisher Concepts
 Change source
• Defines the source system to CDC
 Change set
• Collection of source tables for which
transactionally consistent change data
is needed
 Change table
• Container to receive change data
• Is published to subscribers

16
Oracle Corporation
Publisher Concepts
Source Database: HQ Staging Database: DW
Change Source:
HQ_SRC
Source table:
sh.sales Change Set:
PROD_ID SH_SET
CUST_ID
Change table:
PROMO_ID
sales_ct
AMOUNT_SOLD
PROD_ID
QUANTITY_SOLD CUST_ID
PROMO_ID
AMOUNT_SOLD
Source table:
sh.promotions Change table:
PROMO_ID promo_ct
PROMO_SUBCAT PROMO_ID
PROMO_CAT PROMO_SUBCAT
PROMO_CAT
PROMO_COST

17
Oracle Corporation
Publish Package

DBMS_CDC_PUBLISH
CREATE / ALTER / DROP_AUTOLOG_CHANGE_SOURCE
CREATE / ALTER / DROP_CHANGE_SET
CREATE / ALTER / DROP_CHANGE_TABLE
PURGE
PURGE_CHANGE_SET
PURGE_CHANGE_TABLE
DROP_SUBSCRIPTION

18
Oracle Corporation
Using Change Data: Subscribers

 The subscriber creates a subscription


from an available publication
 The subscription provides a moving
window (view) to the change data
 Subscriptions go against a single
change set and are therefore
transactionally consistent
 When all subscribers have advanced
past old change data, CDC
automatically and efficiently purges

19
Oracle Corporation
Subscriber Concepts
Staging Database: DW
Subscription:
Change Set: sales_promo_list
SH_SET
Publication on :
sh.sales
PROD_ID Subscriber view:
CUST_ID
PROMO_ID spl_sales
AMOUNT_SOLD

Publication on:
sh.promotions
PROMO_ID Subscriber view:
PROMO_SUBCAT
PROMO_CAT spl_promos

20
Oracle Corporation
Subscriber View
Subscriber view: spl_sales

OPERATION$ CSCN$ USERNAME$ PROD_ID CUST_ID PROMO_ID


Insert
I 587322 GRIFFIN 12784 12 0
Update
UO 587482 SLOAN 12784 12 0
before
Update UN 587482 SLOAN 12784 12 42
after
I 594312 BRIGGS 14899 302 42
Insert
I 602311 GRIFFIN 12498 12 55
Insert
D 711413 SLOAN 138922 7934 0
Delete
I 796122 BRIGGS 77741 712 55

Insert I 796122 BRIGGS 13846 712 55

Insert

21
Oracle Corporation
Subscriber Package

DBMS_CDC_SUBSCRIBE
CREATE_SUBSCRIPTION
SUBSCRIBE
ACTIVATE_SUBSCRIPTION
EXTEND_WINDOW
PURGE_WINDOW
DROP_SUBSCRIPTION

22
Oracle Corporation
Security

 Sync publisher must have SELECT


access to the source table
 Async publisher must have
EXECUTE_CATALOG_ROLE privilege
 Publisher uses GRANT and REVOKE
on change tables to control subscriber
access

23
Oracle Corporation
Performance Benchmark*
 Objectives:
• Determine impact on transaction time
• Determine latency
 Source system: Oracle 10g R1 Beta, SunFire
4800 SMP 8x900Mhz/16GB w/striped 8 x Sun
StorEdge T3 arrays (9X36.4MB each)
 Customer insurance quote OLTP application
run at Oracle, 250 concurrent users / 175
TPS, system “warmed up” (steady state)
 Mixture of Inserts, Updates, Deletes,
Singleton Selects, Cursor Fetches, Rollbacks
/ Commits, savepoints
 Capture changes on all tables
* Your mileage will vary!
24
Oracle Corporation
Transaction Performance
Transaction elongated by 10%
Relative impact varies depending on other overhead

1.2

1.15

1.1

1.05

0.95

0.9
no CDC Sync CDC (9i) HotLog CDC AutoLog CDC
(10g) (10g)
25
Oracle Corporation
Transaction Performance
Transaction elongated by 8%
Can reduce elongation by adding RAC nodes / CPUs
1.2

1.15

1.1

1.05

0.95

0.9
no CDC Sync CDC (9i) HotLog CDC AutoLog CDC
(10g) (10g)
26
Oracle Corporation
Transaction Performance
Transaction elongation virtually eliminated
Change capture processing moved off system

1.2

1.15

1.1

1.05

0.95

0.9
no CDC Sync CDC (9i) HotLog CDC AutoLog CDC
(10g) (10g)
27
Oracle Corporation
HotLog Latency Performance
100
% Changes Arrived

80

60

40
20
0
0

0.5

1.5

2.5

3
Seconds
About ½ the change data arrived in 1 second
Virtually all the change data arrived in 2 seconds
28
Oracle Corporation
Summary

 CDC assumes the burden of change


capture for you
 Change data is guaranteed consistent
and complete
 Change data can be shared across
users and applications effortlessly
 CDC delivers change data where you
need it, when you need it, and with
minimal overhead

29
Oracle Corporation
For More Information
 Oracle Data Warehousing Guide, 10gR1,
Chapter 16
 Oracle PL/SQL Packages and Types
Reference, 10gR1, packages DBMS_CDC_*
 http://www.oracle.com/technology/oramag/ora
cle/03-nov/o63tech_bi.html
 http://www.oracle.com/technology/products/bi/
db/10g/pdf/twp_dss_ontime_etl_10gr1_0304.p
df
 http://www.rittman.net/archives/000901.html
 http://www.nyoug.org/cdc.pdf (Oracle9i)
30
Oracle Corporation
Questions?

31
Oracle Corporation

You might also like