You are on page 1of 70

Oracle Data Archiving

Taming the Beast

Dave Moore
Neon Enterprise Software

INTELLIGENCE. INNOVATION. INTEGRITY


Agenda
Archiving Defined

Requirements and Solutions

Oracle Archiving Strategies

Oracle Row Removal Options

Oracle Post Archive Operations

INTELLIGENCE. INNOVATION. INTEGRITY


Dave

 Oracle ACE
 Using Oracle since 1991
 Product Author at Neon Enterprise Software
 Creator of OracleUtilities.com
 Author of “Oracle Utilities” from Rampant
Tech Press
 Core competencies include performance,
utilities and data management

INTELLIGENCE. INNOVATION. INTEGRITY


Database Archiving

Database Archiving:
The process of removing selected data records from
operational databases that are not expected to be referenced
again and storing them in an archive data store where
they can be retrieved if needed.

Purge

INTELLIGENCE. INNOVATION. INTEGRITY


Trends Impacting Archive Needs

Data Retention Issues:


Volume of data
Length of retention
requirement
Varied types of data
Security issues

0 Time Required 30+ Yrs

INTELLIGENCE. INNOVATION. INTEGRITY


Archiving All Types of Data

Paper Word IMS VSAM Outlook Sound


Blueprints Excel DB2 Programs Lotus Notes Pictures
Forms PDF ORACLE UNIX Files Attachments Video
Claims XML SYBASE
SQL Server
IDMS

INTELLIGENCE. INNOVATION. INTEGRITY


Data Archiving and ILM

Create Operational Reference Archive Discard

Needed for Needed for Needed for


completing reporting compliance
business or expected and business
transactions queries protection

Mandatory Retention Period

INTELLIGENCE. INNOVATION. INTEGRITY


Some Sample Regulations
Impacting Data Retention

INTELLIGENCE. INNOVATION. INTEGRITY


What Does It All Mean?

Enterprises must recognize that there is a


business value in organizing their information
and data.

Organizations that fail to respond run the risk


of seeing more of their cases decided on
questions of process rather than merit.
(Gartner, 20-April-2007, Research Note G00148170:
Cost of E-Discovery Threatens to Skew Justice System)

INTELLIGENCE. INNOVATION. INTEGRITY


Operational Efficiency

Database Archiving can be undertaken to


improve operational efficiency

 Large volumes of data can interfere with


production operations
– efficiency of transactions
– efficiency of utilities: BACKUP/RESTORE, REORG, etc.
– Storage
» Gartner: databases copied an average of 6 times!

INTELLIGENCE. INNOVATION. INTEGRITY


What Solutions Are Out There?
 Keep Data in Operational Database
— Problems with authenticity of large amounts of
data over long retention times

 Store Data in UNLOAD files (or backups)


— Problems with schema change and reading
archived data;
using backups poses even more serious problems

 Move Data to a Parallel Reference Database


— Combines problems of the previous two

 Move Data to a Database Archive

INTELLIGENCE. INNOVATION. INTEGRITY


Components of a
Database Archiving Solution

Production
Database

Data
Extract
Recall
Database

Archive Data Store Data


Captured Structure and Retrieve Recall
Archive Policies
Data Retention

Archive Store
Archive Data
Query Access Metadata Data &
Policies
History
Metadata

Archive
Administration

INTELLIGENCE. INNOVATION. INTEGRITY


Archiving Requirements

 Policy based archiving: logical selection


 Keep data for very long periods of time
 Store very large amounts of data in archive
 Maintain Archives for ever changing operational systems
 Become independent from Applications/DBMS/Systems
 Protect authenticity of data
 Access data when needed; as needed
 Discard data after retention period automatically

INTELLIGENCE. INNOVATION. INTEGRITY


Policy based archiving
 Why :
— Business objects are archived, not files
— Rules for when something is ready can be complex
— Data ready to be archived is distributed over database
 Implications:
— User must provide policies for when something is to
be archived
 How:
— Full metadata description of data
— Flexible specification of policy : “WHERE clause”

INTELLIGENCE. INNOVATION. INTEGRITY


For Example…

Parts Master is PARTS MASTER


the parent table Part Number
to all other tables Type
Description
Unit Type
Cost
Price
Substitute Parts

STORAGE INFO ORDER INFO SUMMARY BY DISBURSEMENT


QUARTER

Part Number Part Number Part Number Part Number


Bin Number PO Number Year Dept. ID
Qty on Hand Vendor ID Q1 Disbursed CHIT ID
Qty on Order Quantity Ordered Q2 Disbursed Qty Disbursed
Qty Backorder Unit Cost Q3 Disbursed Date Disbursed
Date Ordered Q4 Disbursed
Date Received

INTELLIGENCE. INNOVATION. INTEGRITY


Keep Data for a Long Time

 Why: retention requirements in decades


 Implications:
— Archive will outlive applications/DBMS/systems that
generated them
— Archive will outlive people who designed and managed
operational systems
— Archive will outlive media we store it on
 How:
— Unique data store
— Application/DBMS/system independence
— Metadata independence
— Continuous management of storage
— Continuous management of archive content

INTELLIGENCE. INNOVATION. INTEGRITY


Maintain Archive for
Changing Operational Systems

 Why :
— Metadata changes frequently
— Applications are re-engineered periodically
– Change DBMS platform
– Change System platform
– Replace with new application
– Consolidate after merger or acquisition
 Implications:
— Archive must support multiple variations of an application
— Archive must deal with metadata changes
 How:
— Manage applications as major archive streams having
multiple minor streams with metadata differences
— Achieve independence from operating environment

INTELLIGENCE. INNOVATION. INTEGRITY


Achieve Metadata Independence
 Why :
— Operational metadata is inadequate
— Operational metadata changes
— Operational systems keep only the “current” metadata
— Data in archive often does not mirror data in operational
structures

 Implications:
— Archive must encapsulate metadata
— Metadata must be improved
 How:
— Metadata Capture, Validate, Enhance capabilities
— Store structure that encapsulates with data
— Keeps multiple versions of metadata

INTELLIGENCE. INNOVATION. INTEGRITY


Protect Authenticity of Data
 Why :
— Potential use in lawsuits/ investigations
— Potential use in business analysis
 Implications:
— Protect from unwanted changes
— Show original input
— Cannot be managed in operational environment
 How:
— SQL Access that does not support I/U/D
— Do not modify archive data on metadata changes
— Encryption as stored
— Checksum for detection of sabotage
— Limit access to functions
— Audit use of functions
— Maintain offsite backup copies for restore if sabotaged

INTELLIGENCE. INNOVATION. INTEGRITY


Access Data Directly From Archive

 Why :
— Cannot depend on application environment
 Implications:
— Full access capability within archive system
 How:
— Industry standard interface (e.g. JDBC)
— LOAD format output for
– For load into a database
– May be different from source database
— Requires full and accurate metadata
— Ability to review metadata
— Ability to function across metadata changes

INTELLIGENCE. INNOVATION. INTEGRITY


Discard Function

 Why :
— Legal exposure for data kept too long
 Implications:
— Data cannot be kept in archive beyond retention period
— Must be removed with no exposure to forensic software
 How:
— Policy based discard
— System level function
— Tightly controlled and audited
— True “zero out” capability
— Discard from backups as well

INTELLIGENCE. INNOVATION. INTEGRITY


Database or Archive?
Keep in DB Keep in Archive

Performance

Space

Compliance

INTELLIGENCE. INNOVATION. INTEGRITY


Based on Data Availability
Keep in DB Keep in Archive Purge

Must be Available to App

Must be Available

Must Be Secure

Not Needed

INTELLIGENCE. INNOVATION. INTEGRITY


Oracle Archiving Strategies

 Designed Up Front (Yeah, right)


 Determined by Application Owner
 Implemented by ____________
 Utilize Oracle Features

INTELLIGENCE. INNOVATION. INTEGRITY


Finding Large Tables

 DBA_SEGMENTS (bytes)
 DBA_TABLES (num_rows)
 or based on I/O

INTELLIGENCE. INNOVATION. INTEGRITY


Rolling Windows

 Self Managing
 Mostly based on DATE
 Utilize DBMS Features
— Partitioning
— Transportable Tablespaces
– Exchange Partition
– Set tablespace read only
– Expdp
– Copy export file and data file

INTELLIGENCE. INNOVATION. INTEGRITY


Rolling Windows via Partitioning

P1 … … P47

Probably Rarely Heavily


Data Profile
Never Accessed Accessed
Accessed

Cheap as
Storage Profile Not so fast Fast,
you can get
or expensive expensive

Read Only / Read Only Read / Write


Compressed

INTELLIGENCE. INNOVATION. INTEGRITY


Why not use transportable tablespaces
or Oracle exports for data retention?

INTELLIGENCE. INNOVATION. INTEGRITY


The Problem with Oracle Files

 Transportable Tablespaces
 Exports
 Backups
Oracle
Export Files & Import
Datafiles Version 16Z
Trans Tsp

Year 2007
Year 2030

Not a good method for LT Data Retention


INTELLIGENCE. INNOVATION. INTEGRITY
Partitioning (Old ways)

 Range Partitioning

Data is distributed based on partition key range


of values – usually a date.

Good When: Data is date-based.

INTELLIGENCE. INNOVATION. INTEGRITY


Partitioning (Old Ways)

 Hash Partitioning

Uses hash algorithm to create equally sized


buckets of data.

Good When: No natural partition key and


desire I/O balancing (hot spots).

INTELLIGENCE. INNOVATION. INTEGRITY


Partitioning (Old Ways)

 List Partitioning

Data is distributed based on LIST of values in


partition key.

Good When: Have short list of values (States,


Regions, Account Types)

INTELLIGENCE. INNOVATION. INTEGRITY


Partitioning (New Ways – 11G)

 Interval Partitioning

Initial Partition is created manually, the rest are


automatically created as new data arrives.

Good When: Need a rolling window!

INTELLIGENCE. INNOVATION. INTEGRITY


Partitioning (New Ways – 11G)

 REF partitioning

Related Tables benefit from same partitioning strategy,


whether column exists in children or not!

Good When: Desire related data to be partitioned in


the same manner.

INTELLIGENCE. INNOVATION. INTEGRITY


Partitioning (New Ways – 11G)

 Virtual Column Partitioning

Partition key may be based on virtual column

Good When: Virtual column is required for


partition key.

INTELLIGENCE. INNOVATION. INTEGRITY


Rows Gotta Go

INTELLIGENCE. INNOVATION. INTEGRITY


Row Removal Options

 SQL DELETE
 CTAS / DROP / RENAME
 TRUNCATE
 Row Marking

INTELLIGENCE. INNOVATION. INTEGRITY


SQL DELETE

 Good for small number of rows


 RI handled automatically
 Oracle was born to DELETE, better than any
PL/SQL that you write.
 Issue with Un-indexed Foreign Keys 

INTELLIGENCE. INNOVATION. INTEGRITY


DELETE Optimization

 Work in batches, committing (only when


programmatically DELETING)
 Use parallel DML (Partitioned tables only)
 Drop Indexes before (if possible)
 Index FK columns

INTELLIGENCE. INNOVATION. INTEGRITY


CTAS

 Works well for PURGE, not archive


 Perfect when you want to keep low percentage of
rows in the table
 Doesn’t handle RI – no DELETE was issued.
 Process
1. Create table with rows you want to keep
2. Drop old table
3. Rename table
4. Recreate indexes

create table new_table unrecoverable as select * from old_table where ...

INTELLIGENCE. INNOVATION. INTEGRITY


TRUNCATE

 Congratulations if your application lends


itself to TRUNCATE without losing new data
 What about RI?
 May truncate or drop individual partitions

INTELLIGENCE. INNOVATION. INTEGRITY


DROP

 DROP PARTITION
 What would you do before you drop it?
 Exchange partition with table
 Transportable tablespace.

INTELLIGENCE. INNOVATION. INTEGRITY


Things to Remember

• Benchmark the best way for you


• Benchmark against real data if possible
• Use parallel DML

INTELLIGENCE. INNOVATION. INTEGRITY


Design Summary

 Create an architecture that lends itself to


aging, archiving, deleting
 This architecture should compensate for
business requirements
— For instance, customer orders not accessible after
6 months … or
— top query performance needed for all ‘ACTIVE’
accounts … etc
 Implement it – THE EASY PART

INTELLIGENCE. INNOVATION. INTEGRITY


Post Archive Challenges

INTELLIGENCE. INNOVATION. INTEGRITY


Post Archive Challenges

“I have successfully deleted 10


billion rows from the table.
HoooAhhhh! Performance will be
great, space will be available, and
I will get credit for optimizing our
data warehouse application, saving
the company billions of dollars”

INTELLIGENCE. INNOVATION. INTEGRITY


… 2 Days Later …

INTELLIGENCE. INNOVATION. INTEGRITY


Post Archive Challenges

Hmmmmm. It looks like …

- Queries are not any faster . . .


- The Select count(*) took the same amount of time . . .
- Space was not freed in Oracle (DBA_FREE_SPACE) . . .
- Space was not freed in the operating system . . .

WHY NOT ????? Where are the benefits ???

INTELLIGENCE. INNOVATION. INTEGRITY


From Swiss to Provolone

After DELETE After Maintenance

INTELLIGENCE. INNOVATION. INTEGRITY


Post Archive Challenges

 Statistics are not fresh


 High Water Marks are very high
 Space has not been freed within
Oracle (if that’s what you want)
 Space has not been freed to the OS

INTELLIGENCE. INNOVATION. INTEGRITY


Refresh Statistics
 Help the optimizer, easy enough
 dbms_stats provides many options

INTELLIGENCE. INNOVATION. INTEGRITY


Automatic Stats
 Recommended by Oracle
 Calls DBMS_STATS_JOB_PROC
 Enabled via:

Begin
dbms_auto_task_admin.enable(
client_name => ‘auto optimizer stats collection’,
operation => NULL,
window_name => NULL);
END;
/

INTELLIGENCE. INNOVATION. INTEGRITY


When do you go manual ?

 High transaction DELETEs or TRUNCATEs


 Bulk loads which add more than 10% of table
size

So there’s our answer – go manual.

INTELLIGENCE. INNOVATION. INTEGRITY


How do we Gather Them?

 NOT the Analyze Command


 Instead DBMS_STATS package

exec dbms_stats.gather_table_stats(ownname => 'BDB',


tabname => 'MASTER', estimate_percent =>
dbms_stats.auto_sample_size);

INTELLIGENCE. INNOVATION. INTEGRITY


High Water Mark

INTELLIGENCE. INNOVATION. INTEGRITY


High Water Mark

INTELLIGENCE. INNOVATION. INTEGRITY


Reset High Water Mark (HWM)

 DROP or TRUNCATE
 Multiple OTHER ways to do this depending on version
 In v9 … alter table move tablespace [tsp name];
— Row movement must be enabled
— Tablespace must be a LMT
— Can move into same tablespace
— Will occupy 2X space temporarily
— Must then rebuild indexes
 In v10 … alter table <table_name> shrink space;

INTELLIGENCE. INNOVATION. INTEGRITY


Freeing Allocated Space

INTELLIGENCE. INNOVATION. INTEGRITY


Create table, check space

SQL> create table space_example as select * from dba_source;

Table created.

SQL> select count(*) from space_example;

COUNT(*)
----------
296463

SQL> exec dbms_space.unused_space(‘DAVE', 'SPACE_EXAMPLE');

Total blocks: 6328


Unused blocks: 1
Unused bytes: 8192
Last Used Block: 55
Last Used Block ID: 10377
Last Used Ext File ID: 4

INTELLIGENCE. INNOVATION. INTEGRITY


Check datafile space

Size Current Poss.


FILE_NAME Poss. Size Savings
-------------------------------------------------- -------- -------- --------
/export/home/ora102/oradata/ora102/qasb001.dbf 29 46 17
/export/home/ora102/oradata/ora102/example01.dbf 69 100 31
/export/home/ora102/oradata/ora102/qasb002.dbf 41 41 0
/export/home/ora102/oradata/ora102/system01.dbf 493 500 7
/export/home/ora102/oradata/ora102/sysaux01.dbf 430 430 0
/export/home/ora102/oradata/ora102/undotbs01.dbf 91 175 84
/export/home/ora102/oradata/ora102/users01.dbf 44 83 39
/export/home/ora102/oradata/ora102/test.dbf 51 70 19

INTELLIGENCE. INNOVATION. INTEGRITY


Delete rows, check space
SQL> delete from space_example;

296463 rows deleted.

SQL> commit;

SQL> exec dbms_space.unused_space(‘DAVE', 'SPACE_EXAMPLE');

Total blocks: 6328


Unused blocks: 1
Unused bytes: 8192
Last Used Block: 55
Last Used Block ID: 10377
Last Used Ext File ID: 4

Nothing Changed !

INTELLIGENCE. INNOVATION. INTEGRITY


Shrink it, check space

SQL> alter table space_example enable row movement;

SQL> alter table space_example shrink space;

SQL> exec dbms_space.unused_space('BDB', 'SPACE_EXAMPLE');

Total blocks: 8
Unused blocks: 4
Unused bytes: 32768
Last Used Block: 4
Last Used Block ID: 5129
Last Used Ext File ID: 4

Space Freed From Table, but still in Oracle

INTELLIGENCE. INNOVATION. INTEGRITY


Check space again

Size Current Poss.


FILE_NAME Poss. Size Savings
-------------------------------------------------- -------- -------- --------
/export/home/ora102/oradata/ora102/qasb001.dbf 29 46 17
/export/home/ora102/oradata/ora102/example01.dbf 69 100 31
/export/home/ora102/oradata/ora102/qasb002.dbf 41 41 0
/export/home/ora102/oradata/ora102/system01.dbf 493 500 7
/export/home/ora102/oradata/ora102/sysaux01.dbf 430 430 0
/export/home/ora102/oradata/ora102/undotbs01.dbf 171 175 4
/export/home/ora102/oradata/ora102/users01.dbf 44 83 39
/export/home/ora102/oradata/ora102/test.dbf 1 70 69

This datafile should be resized to save 69 MB


SQL> alter database datafile '/export/home/ora102/oradata/ora102/test.dbf' resize 1m;

INTELLIGENCE. INNOVATION. INTEGRITY


Free the Space
 Space is still reserved for future inserts and
updates, just not freed back to the OS
 Space will not be automatically freed –
confirm by checking DBA_FREE_SPACE
 Ways to set it free
— drop
— truncate
— alter table move …
— alter table shrink space …

INTELLIGENCE. INNOVATION. INTEGRITY


Unindexed Foreign Keys Example

1 Million Rows PARENT

COL1

ON DELETE CASCADE

CHILD

COL1 1 Million Rows


COL1_PARENT

SQL> DELETE FROM PARENT WHERE COL1 < 1000;


Fky.sql INTELLIGENCE. INNOVATION. INTEGRITY
Before Index
delete from parent
where
col1 < 1000

call count cpu elapsed disk query current rows


------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.08 2 27 0 0
Execute 1 0.90 0.80 4 2208799 6062 999
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.91 0.88 6 2208826 6062 999

delete from "DAVE"."CHILD"


where
"COL1_PARENT" = :1

call count cpu elapsed disk query current rows


------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 999 285.94 293.11 1543900 2208789 1029 999
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1000 285.94 293.11 1543900 2208789 1029 999

INTELLIGENCE. INNOVATION. INTEGRITY


SQL> create index prnt_ndx on child(col1_parent);

delete from parent


where
col1 < 1000

call count cpu elapsed disk query current rows


------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.53 0.47 7 13 7053 999
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.53 0.47 7 13 7053 999

delete from "DAVE"."CHILD"


where
"COL1_PARENT" = :1

call count cpu elapsed disk query current rows


------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 999 0.42 0.46 2 3002 4058 999
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1000 0.42 0.46 2 3002 4058 999

INTELLIGENCE. INNOVATION. INTEGRITY


Unindexed Foreign Keys

 Problem is not limited to DELETE statements


 Search database for unindexed FK columns
 Script is on asktom
— Search for unindex.sql

INTELLIGENCE. INNOVATION. INTEGRITY


Summary Points

Create sound Archiving strategy based on


Oracle technical features as well as business
and/or legal requirements
 Leverage partitioning
 Move partitions to cheap disk when
appropriate
 Make partitions ‘read only’ and compressed
 Remove data via DROP or TRUNCATE if
possible
 If SQL DELETE, make sure to perform
maintenance operations
Consider 3rd partyINTELLIGENCE.
solutions INNOVATION. INTEGRITY
“Well done is better than well said”
Ben Franklin

Questions?

INTELLIGENCE. INNOVATION. INTEGRITY

You might also like