You are on page 1of 21

OPPENHEIMERFUNDS, INC.

Useful Oracle
Functions
With emphasis on Analytic Functions
Tom Keller, Chris Hagen, Josh Schulz

Contents
Analytic Functions...................................................................................................... 2
ROW_NUMBER, RANK and DENSE_RANK....................................................................7
FIRST, LAST, FIRST_VALUE, LAST_VALUE.....................................................................9
More on Windowing.................................................................................................. 11
More on Partition By................................................................................................. 11
MAX, MIN.................................................................................................................. 12
GREATEST, LEAST..................................................................................................... 13
LAG, LEAD................................................................................................................ 13
NVL2......................................................................................................................... 14
COALESCE................................................................................................................ 15
NTILE........................................................................................................................ 16
PERCENT_RANK
..17

LISTAGG................................................................................................................... 19

Some Useful Oracle Functions

Analytic Functions
Analytic functions compute an aggregate value based on a group of rows. They
differ from aggregate functions in that they return multiple rows for each group. The
group of rows is called a window and is defined by the analytic_clause. For each
row, a sliding window of rows is defined. The window determines the range of rows
used to perform the calculations for the current row. Window sizes can be based on
either a physical number of rows or a logical interval such as time.
Analytic functions are the last set of operations performed in a query except for the
final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are
completed before the analytic functions are processed. Therefore, analytic functions
can appear only in the select list or ORDER BY clause.
Analytic functions are commonly used to compute running totals, percentages
within groups, moving averages and Top-N queries just to name a few. Below is a list
of analytic functions.
AVG *
CORR *
COVAR_POP *
COVAR_SAMP *
COUNT *
CUME_DIST
DENSE_RANK
FIRST
FIRST_VALUE *
LAG

LAST
LAST_VALUE *
LEAD
MAX *
MIN *
NTILE
PERCENT_RANK
PERCENTILE_CONT
PERCENTILE_DISC
RANK

RATIO_TO_REPORT
ROW_NUMBER
STDDEV *
STDDEV_POP *
STDDEV_SAMP *
SUM *
VAR_POP *
VAR_SAMP *
VARIANCE *

functions followed by an asterisk (*) allow the full syntax, including the windowing_clause.

Oracle Guru Steve Callan notes that in some cases an analytic function can run a
hundred times faster than regular SQL that does the analytic with a subquery
Example query:
select
fd.broad_inv_capability, fp.fund_code, fp.nav_price_rounded, dense_rank() over (partition
by fd.broad_inv_capability order by nav_price_rounded) drank
from
fund_price fp,
fund_dim fd
where
fd.fund_dim_id = fp.fund_dim_id
and fp.eff_date = to_date('10/16/2012','mm/dd/yyyy')
and fd.broad_inv_capability = 'Global Debt'
order by fd.broad_inv_capability, drank;

OVER() analytic clause indicates an analytic function is being used and will operate
on the entire query result set. The analytic clause will be computed after the joins,
WHERE, GROUP BY and HAVING clauses, but before the final ORDER BY clause.
PARTITION BY analytic clause is used to define groups to which your results will be
placed within. Without this clause, all rows in the result set are considered a single
group.
ORDER BY clause is used to define the order of the records within the partition or
window. This is required for some types of analytical functions such as LEAD, LAG,
RANK, DENSE_RANK, ROW_NUMBER, FIRST, FIRST VALUE, LAST, LAST VALUE

WINDOWING Clause is available for some analytic functions and is used to provide
additional control over the window within the current partition. It is an extension of
the ORDER BY and as such can only be used if an ORDER BY clause is present. There
are two basic forms of the windowing clause.
RANGE BETWEEN start point AND end point. Range is a logical offset (such as
time).
ROWS BETWEEN start point AND end point. Rows is a physical offset (the
number of rows in the window).
Start and end points are:

UNBOUNDED PRECEDING : The window starts at the first row of the partition.
Only available for start points.

UNBOUNDED FOLLOWING : The window ends at the last row of the partition.
Only available for end points.

CURRENT ROW : The window starts or ends at the current row. Can be used
as start or end point.

value_expr PRECEDING : An physical or logical offset before the current row


using a constant or expression that evaluates to a positive numerical value.
When used with RANGE, it can also be an interval literal if the
order_by_clause uses a DATE column.

value_expr FOLLOWING : As above, but an offset after the current row.

For analytic functions that support the windowing clause, the default actions is
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Windowing is
effectively turned on when the ORDER BY clause is added.
Here are some examples of a query as each component of the analytic clause is
applied.
Lets start with a simple group by query that aggregates data. Here we get a row for
each fund with the average price based on the eff_date in the where clause.
select
fund_code,
avg(nav_price_rounded) avg_nav_price_rounded
from
ta_dm.fund_price
where
fund_code in ('00259','00251')
and eff_date between to_date('11/01/2012','mm/dd/yyyy') and
to_date('11/16/2012','mm/dd/yyyy')
group by
fund_code;

By adding the OVER() clause to this same query we are effectively turning on the
analytic function and not get the average price across the entire query without
eliminating any rows like GROUP BY does.
select
fund_code,
eff_date,
nav_price_rounded,
avg(nav_price_rounded) over() avg_nav_price_rounded
from
ta_dm.fund_price
where
fund_code in ('00259','00251')
and eff_date between to_date('11/01/2012','mm/dd/yyyy') and
to_date('11/16/2012','mm/dd/yyyy');

To add groups or partitions to the analytics so that we control what records are
averages, we add the PARTITION BY clause. This will cause the analytic function for
each unique grouping as defined by the partitioned column.
select
fund_code,
eff_date,
nav_price_rounded,
avg(nav_price_rounded) over(partition by fund_code) avg_nav_price_rounded
from
ta_dm.fund_price
where
fund_code in ('00259','00251')
and eff_date between to_date('11/01/2012','mm/dd/yyyy') and
to_date('11/16/2012','mm/dd/yyyy');

By adding the ORDER BY clause to the query, we are defining the order that the
rows will be evaluated while calculating the analytic function. Adding the ORDER BY
clause also effectively turns on WINDOWING for those functions that support that
clause. In the query below, adding the ORDER_BY on fund_code does not change
the query results because the ORDER BY is at the same level as the partition.
select
fund_code,
eff_date,
nav_price_rounded,
avg(nav_price_rounded) over(partition by fund_code order by fund_code)
avg_nav_price_rounded
from
ta_dm.fund_price
where
fund_code in ('00259','00251')
and eff_date between to_date('11/01/2012','mm/dd/yyyy') and
to_date('11/12/2012','mm/dd/yyyy');

But if we add ORDER BY the eff_date instead of the fund_code, then we can see the
default WINDOWNING effect (RANGE BEWTEEN UNBOUNDED PRECEDING NAD
CURRENT ROW). The average is computed on the rows starting at the beginning of
the partition and ending at the current row.
select
fund_code,
eff_date,
nav_price_rounded,
round(avg(nav_price_rounded) over(partition by fund_code order by eff_date),4)
avg_nav_price_rounded
from
ta_dm.fund_price
where
fund_code in ('00259','00251')
and eff_date between to_date('11/01/2012','mm/dd/yyyy') and
to_date('11/12/2012','mm/dd/yyyy');

ROW_NUMBER, RANK and DENSE_RANK


All these of these functions assign integer values to the rows depending on their
order.
ROW_NUMBER gives a running serial number to a partition of records or for each record returned by a
query. The sequence is dictated by the order by clause and begins with 1.
RANK and DENSE_RANK generates a sequential order, or rank, to the query or partition based on the ORDER
BY. When there is a tie between one or more rows, RANK sequential order will not be consecutive,
DENSE_RANK will maintain a consecutive sequence.

select
broad_inv_capability,
composite_aum,
RANK() OVER (ORDER BY composite_aum) rank_position,
DENSE_RANK() OVER(ORDER BY composite_aum) dense_rank_position,
ROW_NUMBER() OVER(ORDER BY composite_aum) row_number_position
from
(
select
d.broad_inv_capability,
sum(f.aum_amount) composite_aum
from
ta_dm.fund_asset_allocation_fact f,
ta_dm.fund_dim d
where
f.fund_dim_id = d.fund_dim_id
and f.eff_date = to_date('10/26/2012','mm/dd/yyyy')

);

group by
d.broad_inv_capability

If you modify the query above and add a DESC in the ORDER BY for ROW_NUMBER, then you will see that the
results change and the Composite_AUM sort become descending. Rank_Position and Dense_Rank_Position
retain their order.
Then by adding an ORDER BY composite_AUM to the main query, returns the sort to ascending while the
Row_Number_Position retains its descending order.

select
composite_desc,
composite_code,
composite_aum,
RANK() OVER (ORDER BY composite_aum) rank_position,
DENSE_RANK() OVER(ORDER BY composite_aum) dense_rank_position,
ROW_NUMBER() OVER(ORDER BY composite_aum) row_number_position
from
(
select
d.composite_desc,
d.composite_code,
sum(f.aum_amount) composite_aum
from
ta_dm.fund_asset_allocation_fact f,
ta_dm.fund_dim d
where
f.fund_dim_id = d.fund_dim_id
and f.eff_date = to_date('10/26/2012','mm/dd/yyyy')
and d.composite_code in
('01591','01331','00215','00230','00240','00251','00254','00270','00330')
group by
d.composite_desc,
d.composite_code
);

Adding a PARTITION BY clause will group the rankings by the partitioned column.
select
broad_inv_capability,
inv_capability,
composite_aum,
RANK() OVER (PARTITION BY broad_inv_capability ORDER BY composite_aum)
rank_position,
DENSE_RANK() OVER(PARTITION BY broad_inv_capability ORDER BY composite_aum)
dense_rank_position,
ROW_NUMBER() OVER(PARTITION BY broad_inv_capability ORDER BY composite_aum)
row_number_position
from
(
select
d.broad_inv_capability,
d.inv_capability,
sum(f.aum_amount) composite_aum
from
ta_dm.fund_asset_allocation_fact f,
ta_dm.fund_dim d
where
f.fund_dim_id = d.fund_dim_id
and f.eff_date = to_date('10/26/2012','mm/dd/yyyy')
and d.broad_inv_capability in ('Municipal Bond','Dynamic Allocation')
group by
d.broad_inv_capability,
d.inv_capability
);

FIRST, LAST, FIRST_VALUE, LAST_VALUE


The general syntax is: FIRST_VALUE(<sql_expr>) OVER (<analytic_clause>)
The FIRST_VALUE analytic function picks the first record from the partition after doing the ORDER BY. If the
first value in the set is null, then the function returns NULL unless you specify IGNORE NULLS.The <sql_expr>

is computed on the columns of this first record and results are returned. The LAST_VALUE function is used in
similar context except that it acts on the last record of the partition.

select
fund_code,
broad_inv_capability,
inv_capability,
inception_date,
inception_date - FIRST_VALUE(inception_date) over (partition by inv_capability order by
inception_date) Day_Gap
from ta_dm.fund_dim
where inv_capability in ('Concentrated Value','Equity Income');

FIRST or KEEP FIRST is a functions that can be used to aggregate the first records
the first rank. In the example below, some partitions contain multiple rows in the
first rank position (Active Allocation). This function allows those to be isolated and
aggregated.
select
fund_code,
broad_inv_capability,
inv_capability,
inception_date,
inception_date - FIRST_VALUE(inception_date) over (partition by inv_capability order by
inception_date) Day_Gap,
COUNT(inception_date) KEEP (DENSE_RANK FIRST ORDER BY inception_date) OVER
(PARTITION BY inv_capability) num_initial_funds
from ta_dm.fund_dim
where inv_capability in ('Concentrated Value','Equity Income','Active Allocation');

More on Windowing
The example query below shows how windowing can be used to control the subset
of data that the aggregate function is executed against. In this case, the window is
controlled by a physical row off-set. The first looks at the current row as well as 3
rows back. The second always looks at the current row, 2 preceding rows and the 1
following row.
select
eff_date,
nav_price_rounded,
round(avg(nav_price_rounded) OVER (ORDER BY eff_date rows 3 preceding),4) avg_3_0,
round(avg(nav_price_rounded) OVER (ORDER BY eff_date rows between 2 preceding and 1
following),4) avg_2_1
from
ta_dm.fund_price
where
fund_code = '00259'
and to_char(eff_date,'yyyy') = '2011';

More on Partition By
If you have multiple analytic functions, each one can have its own partition by
clause. If you need one or more to include the entire query resultset, Partition By 1
will force that to happen.
select
distinct broad_inv_capability,
count(fund_code) over (partition by broad_inv_capability) fund_count,
count(fund_code) over (partition by 1) overall_fund_count,
round((count(fund_code) over (partition by broad_inv_capability) * 100) / count(fund_code)
over (partition by 1), 2) fund_percent
from ta_dm.fund_dim
order by fund_percent;

MAX, MIN
MAX/MIN are aggregate functions that return the maximum/minimum value of an expression by partition.
Syntax: MAX(<sql_expr>) OVER (<analytic_clause>)
Syntax explanation
Example:

The following is a hypothetical example showing what the maximum NAV was for a fund at that specific point
in time for all dates within the given date range.

SELECT fund_code,
eff_date,
MAX (nav_price) OVER (PARTITION BY fund_code ORDER BY eff_date)
AS MAX_NAV
FROM ta_dm.fund_price
WHERE eff_date >= TO_DATE ('10-26-2012', 'mm-dd-yyyy');

GREATEST, LEAST
The GREATEST/LEAST functions return the largest/smallest value in a list of expressions.
Syntax: GREATEST(<sql_expr1>, <sql_expr2>, <sql_expr_n>)
The <sql_expr> are the expressions that are evaluated by the function. If the datatypes of the expressions
are different, all expressions will be converted to the same datatype as the first expression. If one of the
parameters is NULL, the function will return NULL.
Example:
The following hypothetical example shows how the GREATEST function is used to find what the greatest price
of a fund was on any given day.

SELECT fund_code,
eff_date,
nav_price,
nav_price_rounded,
offer_price,
GREATEST (nav_price, nav_price_rounded, offer_price) AS Greatest_Price
FROM ta_dm.fund_price
WHERE eff_date >= TO_DATE ('10-26-2012', 'mm-dd-yyyy');

LAG, LEAD
The LAG and LEAD functions give access to multiple rows within a table, without the need for a self-join. It
returns values from a previous/next row in the table.
Syntax: LAG(<sql_expr> [,offset] [,default]) OVER (<analytic_clause>)
The <sql_expr> can be a column or function, except an analytic function.
The [,offset] determines the number of rows following/preceeding the current row, where the data
is to be retrieved. If no value is specified, the default is 1.
The [,default] determines the value returned if the offset is outside the scope of the window. If no
value is specified, the default is NULL.
Example:
The following is an example of how Tom used the LAG function to find any subaccount_id that had any
start_date that did not match the end_date of the previous record for the same subaccount_id (in the
subaccount_history table on DSDWD2S).

SELECT *
FROM (SELECT subaccount_id,
start_date,
end_date,
LAG (end_date, 1)
OVER (PARTITION BY subaccount_id ORDER BY start_date)
AS prev_end_date
FROM subaccount_history)
WHERE (prev_end_date IS NOT NULL AND start_date <> prev_end_date);

NVL2 Function

The NVL2 function extends the functionality found in the NVL function. It lets you substitutes a value when a null value is
encountered as well as when a non-null value is encountered.
Syntax: NVL2( string1, value_if_NOT_null, value_if_null )
string1 is the string to test for a null value.
value_if_NOT_null is the value returned if string1 is not null.
value_if_null is the value returned if string1 is null.
Example:
select a.account_name,
a.account_type,
a.account_status,
a.close_date,
nvl2(a.close_date,'Y','N') closed_flag
from ta_dm.curr_account_dim a
where account_name in ('002852852083301',
'002872870489929',
'002932931701975',
'003773770248171',
'003775270003135',
'003775270003399',
'003775270005852')
order by 1 asc;

COALESCE Function
The COALESCE function returns the first non-null expression in the list. If all expressions evaluate to null, then the coalesce
function will return null.
Syntax: COALESCE( expr1, expr2, expr3... expr_n )
expr1 to expr_n are the expressions to test for non-null values.
Example:
select mr.fundcode,
mr.ten_yr_tot_ret_nav,
mr.five_yr_tot_ret_nav,
mr.three_yr_tot_ret_nav,
mr.one_yr_tot_ret_nav,
coalesce(mr.ten_yr_tot_ret_nav,
mr.five_yr_tot_ret_nav,
mr.three_yr_tot_ret_nav,
mr.one_yr_tot_ret_nav) as longevity_return
from fund.monthly_returns mr
where trunc(tdate) = '31-OCT-2012'
and fundcode in ('785','874','WTJ','313','VTO')
order by 2 desc, 3 desc, 4 desc, 5 desc;
The above coalesce statement is equivalent to the following IF-THEN-ELSE statement:
IF mr.ten_yr_tot_ret_nav is not null THEN
result := mr.ten_yr_tot_ret_nav;
ELSIF mr.five_yr_tot_ret_nav is not null THEN
result := mr.five_yr_tot_ret_nav;
ELSIF mr.three_yr_tot_ret_nav is not null THEN

result := mr.three_yr_tot_ret_nav;
ELSIF mr.one_yr_tot_ret_nav is not null THEN
result := mr.one_yr_tot_ret_nav;
ELSE
result := null;
END IF;

NTILE Function
The NTILE function divides an ordered data set into a number of buckets indicated by expr1 and assigns the appropriate bucket
number to each row. The buckets are numbered 1 through expr1. The expr1 value must resolve to a positive constant for each
partition.
Syntax: NTILE (<expr1>) OVER ([query_partition_clause] <order by clause>)
expr1 is the number of segmented buckets to divide the data into (e.g. NTILE(4) Quartile, NTILE(10) Decile).
Example:
select *
from (
select mr.fundcode,
mr.ytd_tot_ret_nav,
ntile(4) over (order by mr.ytd_tot_ret_nav desc) ytd_quartile,
mr.inception_yr_tot_ret_nav,
ntile(10) over (order by mr.inception_yr_tot_ret_nav desc) inception_decile,
ntile(100) over (order by mr.inception_yr_tot_ret_nav) inception_percentile
from fund.monthly_returns mr
where trunc(tdate) = '31-OCT-2012'
and mr.ytd_tot_ret_nav is not null
and mr.inception_yr_tot_ret_nav is not null
)
where fundcode in ('252','463','360','785','874','WTJ','313');

PERCENT_RANK Function
The PERCENT_RANK function calculates the relative rank of a row within a data set. The range of values returned by this
function will always be between 0 to 1 (inclusive). The returned datatype from this function is always a NUMBER.

Syntax:

PERCENT_RANK () OVER ([query_partition_clause] <order by clause>) Analytical


or
PERCENT_RANK (<expr>) WITHIN GROUP (<order by clause>) - Aggregate
For a row (r), PERCENT_RANK calculates the rank-1, divided by 1 less than the total number of rows being
evaluated (n).
PERCENT_RANK = (r-1)/(n-1)

Analytic Examples:

select mr.fundcode,
mr.inception_yr_tot_ret_nav,
rank() over (order by mr.inception_yr_tot_ret_nav) as rank,
rank() over (order by mr.inception_yr_tot_ret_nav) -1 as percent_rank_r_value,
max(rownum) over (partition by 1)-1 as percent_rank_n_value,
round(percent_rank() over (order by mr.inception_yr_tot_ret_nav),6) as pct_rank,
round(percent_rank() over (order by mr.inception_yr_tot_ret_nav),6)*100 as pct_rank2
from fund.monthly_returns mr
where trunc(tdate) = '31-OCT-2012'
and mr.ytd_tot_ret_nav is not null
and mr.inception_yr_tot_ret_nav is not null
order by pct_rank2 desc;

select mr.fundcode,
mr.inception_yr_tot_ret_nav,
rank() over (order by mr.inception_yr_tot_ret_nav desc) as rank,
rank() over (order by mr.inception_yr_tot_ret_nav desc) -1 as percent_rank_r_value,
max(rownum) over (partition by 1)-1 as percent_rank_n_value,
round(percent_rank() over (order by mr.inception_yr_tot_ret_nav desc),6) as pct_rank,
round(percent_rank() over (order by mr.inception_yr_tot_ret_nav desc),6)*100 as pct_rank2
from fund.monthly_returns mr
where trunc(tdate) = '31-OCT-2012'
and mr.ytd_tot_ret_nav is not null
and mr.inception_yr_tot_ret_nav is not null
order by pct_rank2 asc;

Aggregate Example:
This example calculated the percent rank of a hypothetical 20% return compared to the existing returns within the data.
select round(percent_rank(20) within group (order by inception_yr_tot_ret_nav),6) as pct_rank_asc,
round(percent_rank(20) within group (order by inception_yr_tot_ret_nav desc),6) as pct_rank_desc
from fund.monthly_returns mr
where trunc(tdate) = '31-OCT-2012'
and mr.ytd_tot_ret_nav is not null
and mr.inception_yr_tot_ret_nav is not null;

LISTAGG Function
The LISTAGG function is an Oracle-supported function as of Oracle 11gR2 which orders data within each group specified in the
ORDER BY clause and then concatenates the values of the (<expr1>) column. LISTAGG is used to perform string aggregation
natively (the grouping and concatenation of multiple rows of data into a single row per group).
Like other aggregate functions, LISTAGG can be converted to an analytic function by adding the optional OVER() clause.
LISTAGG can use a range of constants or expressions as a delimiter for the aggregated strings. The delimiter is optional and can
be excluded altogether if desired.
The results of LISTAGG are limited to the maximum size of VARCHAR2(4000). The function will return a data type of VARCHAR2
unless the data type of (<expr1>) is RAW (binary or byte-oriented graphic/audio data) which will result in a RAW data type being
returned instead.
Syntax: LISTAGG (<expr1> [, delimiter_expr]) WITHIN GROUP (<order by clause>) [OVER <query_partition_clause>]
expr1 is the column where values will be aggregated. NULL values are ignored.
delimiter_expr is the desired delimiter placed within single quotes (). It is optional, defaulting to NULL if not specified.
WITHIN GROUP is a clause used in other aggregate functions (e.g. RANK, DENSE_RANK). In this case it is used to
pivot multiple rows onto a single row.
Examples:
select
security_lending_type_code,
composite_fund_key
from stage.s2_fund_codes_security_lend_v
order by security_lending_type_code;

select
security_lending_type_code,
listagg(composite_fund_key, ', ') within group (order by composite_fund_key) key_list
from stage.s2_fund_codes_security_lend_v
group by security_lending_type_code;

select key_list
into v_keys
from (select SECURITY_LENDING_TYPE_CODE,
''''||listagg(composite_fund_key, ''', ''') WITHIN GROUP (ORDER BY composite_fund_key)||'''' key_list
from stage.S2_FUND_CODES_SECURITY_LEND_V
group by SECURITY_LENDING_TYPE_CODE
)
where security_lending_type_code = 'U.S. Govt Agency Bonds';

select a.fund_code_five_digit,
b.fund_invests_in
from STAGE.S2_FUND_SHARECLASS a,
(select shareclass_key,
listagg(FUND_INVEST_IN_CODE, ', ') WITHIN GROUP (ORDER BY FUND_INVEST_IN_CODE) fund_invests_in
from STAGE.S2_FUND_CODES_FUND_INVEST_IN_V
group by shareclass_key
having listagg(FUND_INVEST_IN_CODE, ', ') WITHIN GROUP (ORDER BY FUND_INVEST_IN_CODE) != 'n/a'
) b
where a.fund_code_five_digit is not null
and a.shareclass_key = b.shareclass_key
order by a.fund_code_five_digit;

select a.fund_display_name,

listagg(a.fund_code_five_digit, ', ') within group (order by a.fund_code_five_digit) five_digit_fund_codes,


listagg(a.share_class||' - '||a.fund_code_five_digit, ', ') within group (order by a.share_class) shareclass_fund_code,
listagg(a.share_class||' - '||to_char(a.inception_date,'MM/DD/YYYY'), ', ') within group (order by a.share_class)
shareclass_inception_dates
from STAGE.S2_FUND_PROSPECTUS_EXP_RATIO a
group by a.fund_display_name
order by a.fund_display_name;

You might also like