You are on page 1of 12

What is a collection?

A collection is nothing more than an array. The most basic collection is a single dimension
array. Later on in this article, we will get into some examples of very complex arrays but we'll
start with the simple stuff.

Associative Array

In Oracle 7, Oracle introduced an index by table, which was also called a PL/SQL table. This
table was different from a SQL table in that it existed only in PL/SQL and you could NOT
perform SQL against it.

The most basic example of an index by table:

TYPE a_number IS TABLE OF NUMBER INDEX BY BINARY_INTEGER.

This creates an array of unlimited size (up to your OS and DB version limitations) of NUMBER
which is indexed by a BINARY_INTEGER datatype. The index is just the subscript and
BINARY_INTEGER is just a numeric data type. An index by table does NOT have to be initialized.

Below is a small example showing how to create an array, create a variable from the array and
then populate it with 5 values and display them in a loop:

DECLARE

-- Declare the type

TYPE a_char_data IS TABLE OF VARCHAR2(10) INDEX BY BINARY_INTEGER;

--Declare a variable of that type

v_char_data a_char_data;

BEGIN

-- Populate some data

v_char_data(1) := 'ABC';

v_char_data(2) := 'DEF';

v_char_data(3) := '1234567890';

v_char_data(4) := 'X+Y=Z';

v_char_data(5) := 'DE NADA';


-- Loop thru the data and display via dbms_pitput

FOR i IN v_char_data.FIRST..v_char_data.LAST LOOP

dbms_output.put_line( v_char_data(i) );

END LOOP;

END;

This example introduced two Collection Methods. If you've read my Sorcerer of OO series, you
know that a method is just a function assigned to an object type. The FIRST method returns the
first subscript in a collection, in this case a 1, and the LAST method returns the last subscript in
a collection, in this case a 5. I'll talk about additional methods later in this article.

Records

Before I go any further with collections, I think it's important to understand another feature
Oracle introduced in v7, records. You can think of a PL/SQL record just like you would the
column layout in a table. A PL/SQL record is for use in PL/SQL programs.

DECLARE

-- Declare the record type

TYPE r_data IS RECORD (

ssn VARCHAR2(9) NOT NULL := -1,

name VARCHAR2(30),

dob DATE );

--Declare a variable of that type

v_data r_data;

BEGIN

-- Set some values

v_data.ssn := '123456789';

v_data.name := 'Lewis';
v_data.dob := '01-JAN-1900';

END;

You can also declare a record using the structure of a table, i.e. SCOTT.EMP%ROWTYPE. That's a
subject for a different discussion, however.

When you combine a record type with a collection type, you have a very powerful set of tools
for building in memory data structures. With a minor change to the above examples, we can
create a ragged, multi-dimensional array:

DECLARE

-- Declare a basic table type type

TYPE a_char_data IS TABLE OF VARCHAR2(10) INDEX BY BINARY_INTEGER;

-- Declare a complex record type

TYPE r_data IS RECORD (

ssn VARCHAR2(9) NOT NULL := -1,

name a_char_data, -- Notice the table_type used here

dob DATE );

-- Declare a index by table using the complex record type

TYPE a_multi IS TABLE OF r_data INDEX BY BINARY_INTEGER;

-- Declare a variable using the complex array

v_data a_multi;

BEGIN

-- Set some values

v_data(1).ssn := '123456789';

v_data(1).dob := '01-JAN-1900';
-- Notice the second subscript

v_data(1).name(1) := 'Lewis';

v_data(1).name(2) := 'Joe';

dbms_output.put_line(v_data(1).ssn);

-- Loop through the v_data(1).name table

FOR i IN v_data(1).name.FIRST..v_data(1).name.LAST LOOP

dbms_output.put_line(v_data(1).name(i));

END LOOP;

END;

Let me walk you through exactly what this example is doing:

• In thedeclare section:
o First we declare our very basic array of VARCHAR2(10).
o Next we declare a record type with an embedded index by table.
o So now we have a ragged record type. A single record with a dimensional name
column.
o Then we declare a new table based on the complex record type. This creates
an array of ragged records.
o And finally, we create the complex variable.
• Executeable section:
o We populated the ssn and dob columns of the first record of the v_data
variable.
o Next we populated the first and second rows of the name table in the first row
of the v_data variable.
o Finally, we displayed the ssn of the first row of the v_data variable and then
looped through the name table of the first row of the v_data variable.

Phew, again! If you are not quite getting this, stick with it. Cut and paste it into TOAD or
PL/SQL Developer or something like that. It is confusing at first but it is so hugely powerful
that it's worth the time.

Don't try this in v7 though. Oracle has continuously made improvements to both PL/SQL in
general and to collections specifically. As a matter of a fact, it was a while before you could
have anything but single dimension tables (you could not combine records and collections). And
I'm not sure exactly which release added multi-dimensional tables but it was very recent.
Oracle 10g also added a new feature to collections that has been a long time coming.
Associative arrays! Wait a minute. Isn't that just index by tables with a new name? Sort of.
What they added was index by varchar2. So now you can have something like this:

DECLARE

-- Declare the type using varchar2

TYPE a_depts IS TABLE OF NUMBER INDEX BY VARCHAR2(10);

--Declare a variable of that type

v_depts a_depts;

-- Declare a counter

i VARCHAR2(10);

BEGIN

-- Loop thru scott.emp

FOR c_depts IN ( SELECT deptno, ename FROM SCOTT.emp ) LOOP

-- Load up the array with values from scott.emp

v_depts(c_depts.ename) := c_depts.deptno;

END LOOP;

-- Initialize our CHAR counter, i.e. ename

i := v_depts.FIRST;

-- Loop thru the data

LOOP

-- Exit when the next value is null, i.e. does not exist

EXIT WHEN NOT v_depts.EXISTS(i);

dbms_output.put_line( i || ' is in dept: ' || to_char(v_depts(i)) );

-- Increment the counter

i := v_depts.next(i);
END LOOP;

END;

I introduced two new collection methods here. EXISTS and NEXT. EXISTS accepts a subscript and
returns TRUE if the element of the array at that subscript exists. NEXT returns the subscript of
the next elements. PRIOR was not used here but it would return the subscript of the previous
element in the array. By using PRIOR and NEXT, you can loop through arrays indexed by non-
numeric data as well as loop through sparse arrays. I'll cover sparse arrays in one of the next
entries on this topic.

Turbocharged DML with FORALL

Oracle introduced two new data manipulation language (DML) statements for PL/SQL in
Oracle8i: BULK COLLECT and FORALL. Both statements implement a form of array
processing inside PL/SQL; BULK COLLECT facilitates high-speed retrieval of data, and
FORALL greatly improves performance of INSERT, UPDATE, and DELETE operations.
Oracle Database achieves significant performance gains with these statements by greatly
reducing the number of context switches between the PL/SQL and SQL statement
execution engines.

With BULK COLLECT, you fetch multiple rows into one or more collections, rather than
individual variables or records. The following example of BULK COLLECT retrieves all
books containing "PL/SQL" in their titles into an associative array of records, all in a
single pass to the database.

DECLARE
TYPE books_aat IS TABLE OF book%ROWTYPE INDEX BY PLS_INTEGER;
books books_aat;
BEGIN
SELECT *
BULK COLLECT INTO book
FROM books
WHERE title LIKE '%PL/SQL%';
...
END;

Similarly, FORALL transfers data from a PL/SQL collection to the specified table using
collections. The following code example shows a procedure that accepts a nested table of
book information and inserts the full contents of that collection (the binding array) into
the book table. Note that this example also takes advantage of Oracle9i's FORALL
enhancement that allows the INSERT of a record directly into a table.
BULK COLLECT and FORALL are very helpful, not only in improving performance but also
in simplifying the code you need to write for SQL operations in PL/SQL. The following
FORALL INSERT of multiple rows demonstrates rather clearly why PL/SQL is considered
to be the best programming language for the Oracle database.

CREATE TYPE books_nt IS TABLE OF book%ROWTYPE;


/
CREATE OR REPLACE PROCEDURE add_books ( books_in IN books_nt ) IS
BEGIN
FORALL book_index IN books_in.FIRST .. books_in.LAST
INSERT INTO book
VALUES books_in(book_index);
...
END;

Prior to Oracle Database 10g, however, there was an important restriction on using
collections with FORALL: The database read the contents of the collection sequentially
from the first to the last row in the IN range clause. If it encountered a row within that
range that was undefined, Oracle Database would raise the ORA-22160 exception:

ORA-22160: element at index [N] does not exist

For simple applications of FORALL, this rule would not cause any difficulties. If, however,
you want to take full advantage of FORALL in the widest possible range of circumstances,
the requirement that any FORALL driving array be sequentially populated might increase
program complexity and degrade performance.

In Oracle Database 10g, PL/SQL now offers two new clauses in the FORALL statement,
INDICES OF and VALUES OF, which allow you to choose very selectively which rows
from the driving array should be processed by the extended DML statement.

INDICES OF comes in very handy when your binding array is sparse or contains gaps.
The syntax for this statement is:

FORALL indx IN INDICES OF sparse_collection


INSERT INTO my_table
VALUES sparse_collection (indx);

VALUES OF responds to a different situation: My binding array may or may not be sparse,
but I want to use only a subset of the elements in that array. I can then use VALUES OF to
point to just the values I wish to use in my DML operation. The syntax for this statement
is:

FORALL indx IN VALUES OF pointer_array


INSERT INTO my_table
VALUES binding_array (indx);
In the remainder of this article, I will take a look at how to move from a CURSOR FOR
loop to a FORALL implementation and how the VALUES OF clause can ease your
programming tasks.

Moving FOR Loops to FORALL

Suppose I need to write a procedure that gives raises to employees who are eligible (as
determined by the comp_analysis.is_eligible function) and writes reports to the
employee_history table for any employees who are ineligible for that raise. I work for a
very large corporation; we have lots and lots of employees.

This is not a terribly challenging task for a PL/SQL developer. I don't even need to use
BULK COLLECT or FORALL to get the job done, as I demonstrate in Listing 1. Instead, I use
a CURSOR FOR loop and individual INSERT and UPDATE statements. This is straightforward
code; unfortunately, I have been given 10 minutes in which to run this code, and my "old-
fashioned" approach is running 30 minutes or more.

Fortunately, my company upgraded to Oracle9i and, even more fortunately, I learned


about bulk processing at a recent Oracle seminar (as well as from the excellent
demonstrations provided on Oracle Technology Network). So I decide to rewrite my
program, using collections and bulk processing. The result is shown in Listing 2.

A quick glance at Listings 1 and 2 should make one point very clear: converting to
collections and bulk processing can increase the volume and complexity of your code. If
you need a serious boost in performance, however, that increase is well-justified. Now
let's move beyond the quick glance and see what it takes to handle the conditional logic
inside a CURSOR FOR loop when using FORALL.

Defining Collection Types and Collections

In Listing 2, the first part of the declaration section (lines 6 through 11) defines several
different collection TYPEs, one for each of the columns I will be retrieving from the
employee table. I would rather declare a single collection TYPE based on
employee%ROWTYPE, but FORALL does not yet support operations on collections of records
in which I reference individual fields. So I still must declare a separate collection for each
column for employee IDs, salaries, and hire dates.

Next, I will declare the collections I need for each of those columns (lines 13 through 21).
First, I define the collections corresponding to the queried columns (lines 13 through 15):

employee_ids employee_aat;
salaries salary_aat;
hire_dates hire_date_aat;

I then need a new collection that holds the IDs only of those employees who are approved
for a raise (line 17):
approved_employee_ids employee_aat;

Finally, I declare one collection for each column that will be used to record those who
were ineligible for raises (lines 19 through 21):

denied_employee_ids employee_aat;
denied_salaries salary_aat;
denied_hire_dates hire_date_aat;

Drilling Down Into the Code

With these data structures in place, let's now skip down to the executable section of this
procedure (lines 72 through 75), to begin to understand how the collections are used to
speed up my process.

retrieve_employee_info;
partition_by_eligibility;
add_to_history;
give_the_raise;

I have used step-wise refinement (also known as "top-down design") to write this
program. So rather than have a very long and hard-to-follow executable section, I have a
four-line section that describes each step in my procedure by name. First, I retrieve the
employee information (all the employees in the specified department). Then I partition or
separate out those employees who will receive a raise and those who are denied. Once
partitioned, I can add the denied employees to the employee history table and give a raise
to the rest.

Writing code in this fashion makes the end result much more readable. I can then drill
down into whichever area of the program is of concern to me.

With my collections declared, I can now use BULK COLLECT to retrieve the employee
information (lines 23 through 30). This effectively replaces the CURSOR FOR loop. Now
my data is loaded into my collections.

The partition logic (lines 32 through 46) requires that I go through each row in the
collections I just populated and check the eligibility for a raise. If approved, then I copy
the employee ID from my query-populated collection to the approved collection. If the
person is not eligible, I copy the employee ID, salary, and hire date, because all are
needed for the insert into the employee_history table.

With my original data now separated into two collections, I can use each of those
collections as drivers for two different FORALL statements (starting on lines 51 and 66). I
insert the denied collection data in bulk to the employee_history table
(add_to_history), and I update the approved employee information in bulk in the
employee table in the give_the_raise procedure.
Let's finish our examination of this rewrite with a closer look at add_to_history (lines
48 through 61). The FORALL statement (line 51) contains an IN clause that specifies the
range of row numbers to be used in the bulk INSERT. In my explanation of the second
rewrite of my procedure, I will refer to the collection used to define that range as the
"driving collection." In this version of add_to_history, however, I simply say: use all
the rows defined in denied_employee_ids. Within the INSERT itself, I use all three of
the denied collections; I will refer to these collections as the "data collections." As you
can see, the driving and data collections do not need to match. This is a critical point
when learning to use the new features in Oracle Database 10g.

The bottom line, then, of my rewrite in Listing 2 is that I have approximately twice the
number of lines of code as in Listing 1, but the Listing 2 code runs within the required
window of time. Prior to Oracle Database 10g, in such a situation I would simply be glad
that I was able to fit within the window and move on to my next tasks.

With the latest version of PL/SQL in Oracle Database 10g, however, I now have further
opportunities for improvement, in terms of performance, readability, and code volume.

Applying VALUES OF to the Process

In Oracle Database 10g, you can now specify a subset of rows in the driving collection to
be used by a FORALL statement. You can define that subset in one of two ways:

• Match the row numbers in the data collection(s) with the row numbers in the
driving collection. You would do this with the INDICES OF clause.
• Match the row numbers in the data collection(s) with the values found in the
defined rows of the driving collection. You would do this with the VALUES OF
clause.

I will use the VALUES OF clause in my second and final rewrite of


give_raises_in_department. Listing 3 contains the full code for this version. I will
skip over the parts of this program that are unchanged from the previous version.

Starting with the declaration of collections, notice that I no longer define additional
collections to hold the approved and denied employee information. Instead, in Listing 3
(lines 17 through 21) I declare two "guide" collections: one for approved employees and
another for those denied raises. The datatype of both these collections is Boolean; as you
will soon see, the datatype of the collections doesn't matter in the least. The only
information the FORALL statement cares about is which rows are defined.
Implementation Elapsed Time
CURSOR FOR
00:00:38.01
loop
The elapsed times for the
three implementations of Pre-Oracle
give_raises_in_department Database 10g 00:00:06.09
with 50,000 rows in the bulk processing
employee table.
Oracle Database
10g bulk 00:00:02.06
processing

Implementation Elapsed Time


CURSOR FOR
00:00:58.01
loop
The elapsed times for the
three implementations of Pre-Oracle
give_raises_in_department Database 10g 00:00:12.00
with 100,000 rows in the bulk processing
employee table
Oracle Database
10g bulk 00:00:05.05
processing
Table 1: Testing the timing with 50,000 and 100,000 rows

The retrieve_employee_info subprogram is the same as before, but the way I partition
my data is now quite different (lines 32 through 44). Rather than copy records of
information from one collection to another (a relatively slow operation), I simply define
the row (by assigning it a value of TRUE) in the appropriate guide collection that matches
the row number of the employee IDs collection.

I can now use the approved_list and denied_list collections as the driving
collections in two different FORALL statements (starting on lines 49 and 65).

To insert into the employee_history table, I use this formulation:

FORALL indx IN VALUES OF denied_list

And to perform the updates (to give employees their raises), I use this formulation:

FORALL indx IN VALUES OF approved_list


Within both DML statements, the data collections are the original collections populated in
the BULK COLLECT retrieval step; no copies were made. With VALUES OF, Oracle
Database chooses among the rows in those data collections, using only those rows whose
row numbers match those of the driving collection.

By taking advantage of VALUES OF in this program, I was able to avoid copying entire
records of information, replacing them with a simple list of row numbers. For large
arrays, the overhead of performing these copies can be quite noticeable. To test the
advantages of the Oracle Database 10g approach, I loaded the employee table and ran
tests with 50,000 and then 100,000 rows of data. I modified the pre-Oracle Database 10g
bulk-processing implementation to perform many more copies of collection contents in
order to emulate more of a real-world scenario. Then I used SQL*Plus SET TIMING ON to
show the elapsed time for running each of my various implementations. Table 1 shows
the results.

The conclusions from these timings are clear: Switching from individual DML statements
to bulk processing results in greatly reduced elapsed time, from 38 to 6 seconds with
50,000 rows and from 58 to 12 seconds with 100,000 rows. Furthermore, by avoiding the
copies of data with VALUES OF, I was able to cut my elapsed time roughly in half.

Even without the improvement in performance, VALUES OF and its sister clause, INDICES
OF, increase the flexibility of the PL/SQL language and make it easier for developers to
write more-intuitive and more-maintainable code.

PL/SQL is, at this point in the product's life, a mature and powerful language. As a
consequence, many of its new features are incremental and evolutionary. Nevertheless,
these new features can make a significant difference in an application's performance and
a developer's productivity. VALUES OF is a fine example of such a feature.

You might also like