You are on page 1of 14

BKF ANALYST BOOTCAMP MODULE 2: SESSION 2

AGGREGATING AND TRANSPOSING DATA


SEMEFPA1, November 2013

BADAN KEBIJAKAN FISKAL


KEMENTERIAN KEUANGAN RI

DEVELOPED AS PART OF THE SUPPORT FOR ENHANCED MACROECONOMIC AND FISCAL POLICY ANALYSIS (SEMEFPA) PROGRAM

KEY DATA CONSTRUCTION TECHNIQUES

BKF ANALYST BOOTCAMP

A LITTLE EXCEL NOTATION


As you will have noticed during your problem set, a little excel notation is sometimes very helpful in making your task faster. This section covers a few very useful pieces of excel notation. 1. TEXT Excel understands as text anything that is either not numerical or in quotation brackets ( ). Combining text (and non-text) is simple, you can use the ampersand & sign to combine two or more strings of text together. An example is set out in the box on the left below. For cutting text, starting from the left side of the string of text, use the LEFT function: =LEFT(references the cell containing the text you want to cut, number of characters you want) For cutting text, starting from the right side of the string of text, use the RIGHT function: =RIGHT(references the cell containing the text you want to cut, number of characters you want) For cutting a portion from the middle of a string of text, use the MID function: =MID(references the cell containing the text you want to cut, number of the character you want to start from, number of characters you want) To know how many characters there are in string, use the LEN function: =LEN(cell reference containing the text you want to know the length of) Figure 1: Examples of the use of LEFT, RIGHT, MID, and LEN functions in Excel

KEY DATA CONSTRUCTION TECHNIQUES 2. DATES

BKF ANALYST BOOTCAMP

Knowing some date notation helps in setting up automated spreadsheets. In particular, the following functions can be used to report todays date, or return a value for the day, month or year for a particular date: =TODAY(), returns todays date =DAY(date), returns the day of a date =MONTH(date), returns the month of a date =YEAR(date), returns the year of a date There is no formula for returning the quarter (i.e. Q1, Q2, Q3 or Q4) corresponding to a particular date. However, we can use the following expression to return the quarter corresponding to a particular date: =ROUNDUP(MONTH(date)/3,0): divides the number of the month by 3 and rounds the result up to 0 returning effectively 1,2,3. Another useful date-related formula for time series data is: =DATE(year, month ,day), which returns the date in a format that excel can understand. The box below provides example of how to extract the day, month, year and quarter from todays date in rows 19 to 24, and how to calculate todays date in a months time in row 25 note how the formula reads MONTH(K20)+1 in the month section. This is very useful when constructing series starting from a certain date The formula on row 30 provides an example of how to convert data in an unfriendly format into a more workable one. In this case, the date 2013Q4_SA a format commonly used by statistical software can be turned into 12/1/2013. Figure 2: Examples of the use of common date-related Excel functions

KEY DATA CONSTRUCTION TECHNIQUES

BKF ANALYST BOOTCAMP

CONDITIONAL STATEMENTS
In the annex to session 1 we provided a short introduction on the functioning of logical statements. The main logical statement tool is the IF() function. The IF() function can be written as follows: =IF( logical statemnt, What excel should do if the statement is true, What excel should do if the statement is false)

There is a large number of uses for an IF() function, the example below reports how to get the IF() function to respond to different answers to a very important question. Figure 3: An example of the use of an IF() function in Excel

More seriously, the IF() function can be used to filter out certain values. Suppose we are only interested in the monthly values for the first quarter of the year. Figure 4 below shows how to create a new series containing only the first quarter of the year. Similarly, Figure 5 shows how to generate a series containing only values of the original series that are above the average of the series. Figure 4: Using an IF() function to create a new series containing only the first quarter of the year

If: the number of the month is 3 (we are in the 1st quarter)

Then: give me the value of the series in this date Otherwise: give me a blank cell

If: the value of the series in this date is than the average of the series

Then: give me the value of the series in this date

Otherwise: give me a blank cell

KEY DATA CONSTRUCTION TECHNIQUES

BKF ANALYST BOOTCAMP

Or to perform conditional calculations. Suppose a financial institution faces an interest of 5 percent on its current exposures, provided they remain below a certain value, and 7.5 percent above such a value. For simplicity, suppose current exposures are 100 and the interest rate is 5 percent as long as exposures remains below 200. The example below shows how to perform this conditional calculation using an IF() function. Figure 5: Using an IF() function to generate a series containing only values of the original series that are above the average of the series

If: the value of the series is below the threshold

Then: Use the low interest rate

Otherwise: Use the high interest rate

KEY DATA CONSTRUCTION TECHNIQUES

BKF ANALYST BOOTCAMP

REFERENCING
Referencing formulas are very important both for aggregating and transposing data. This section introduces their basic mechanics. Although this section might seem a little abstract, you will see the use of these formulas in the following sections. The OFFSET() function reports the reference of a cell a specified number of rows and columns away from given a starting point. =OFFSET( Starting point, Number of rows you wish to skip [+ to skip on the right, - to skip on the left], Number of columns you wish to skip [+ to skip down, - to skip up] , Number of rows you wish to include [height of the selection box], Number of columns you wish to include [length of the selection box])

The last two entries in the formula define the selection box. In the case in which the selection box is only 1 cell, the OFFSET() function reports the value in the selection box. In the example in Figure 6 below, using the formula on row 4, skipping 3 cells down and 3 cells right of the starting point brings us from the starting point to cell F5, where the value is n. We will see later how to use this to transpose data. In the case in which the selection box is larger than 1 cell, the OFFSET() function will not report any value but provides the coordinates for other functions such as SUM() or AVERAGE(). You can see intuitively how SUM(OFFSET()) and AVERAGE(OFFSET()) can be used to aggregate data from monthly to quarterly frequency. Figure 6: Introducing the OFFSET() function

The INDEX() function returns a value in an array given the row and column coordinates specified. =INDEX( Array in which the value to be reported is, Row number of the item to be reported, Column number of the item to be reported)

KEY DATA CONSTRUCTION TECHNIQUES

BKF ANALYST BOOTCAMP

In the example below you can see how the INDEX() function returns f as item on the second row and second column in the red box, it returns d as 4th item in the green box and e as second item in the blue box. Figure 7: Introducing the INDEX () function

The MATCH() function returns the relative position of a value in an array. Unlike the INDEX() function, where coordinates are specified, the MATCH() function searches for the relative coordinates of a value, given the value itself and the array in which it is contained. =MATCH( Value of which we want the relative reference, Array in which the value is, Type of match [1=less than, 0=exact, -1=greater than] )

In the example below you can see how the MATCH() function reports that c is in the third position in the array D3:G3 Figure 8: Introducing the MAX () function

The INDEX() and MATCH() function can be used together as an alternative to OFFSET() to transpose data, the last section will illustrate how.

KEY DATA CONSTRUCTION TECHNIQUES

BKF ANALYST BOOTCAMP

AGGREGATING DATA
AGGREGATING USING CONDITIONAL STATEMENTS Excel can perform normal operations, such as SUM() and AVERAGE() according to multiple criteria. You can for example instruct Excel to sum or average all values within a certain category such as for a country or a given year from a larger array of data. Alternatively, you can also instruct Excel to aggregate or sum data from across multiple categories: such as for a country and year, or all values between two specified dates. The specification of the SUMIFS() and AVERAGEIFS() functions are below. =SUMIFS( What to sum, Over what is your first criterion,, What is your first criterion, Over what is your second criterion, What is your second criterion ) What to average, Over what is your first criterion,, What is your first criterion, Over what is your second criterion, What is your second criterion )

=AVERGAIFS(

KEY DATA CONSTRUCTION TECHNIQUES

BKF ANALYST BOOTCAMP

Aggregating across categories The example below shows an example of how to aggregate data for different countries. The structure of the formula is the same for summing or averaging. Figure 8: An example of a SUMIF() function to aggregate all values in a series for a given country

Sum over this

If this

Is equal to this.

KEY DATA CONSTRUCTION TECHNIQUES

BKF ANALYST BOOTCAMP

Aggregating to transform data into a lower frequency series Aggregating data to obtain a series reported at a lower frequency such as by transforming a monthly time series into a quarterly series requires the use of multiple conditions. The example in Figure 9 below we shows how this works. The process for aggregating into quarterly totals is exactly the same as our previous example, however now we will use a SUMIFS() rather than an AVERAGEIFS() function. Note that the process for aggregating daily, monthly, quarterly and yearly data into lower frequency is always the same. Only the conditions governing the range of data to include in the average calculation, specified in the two criterion, which change. Figure 9: An example of an AVERAGEIF() function to take the AVERAGE of data over each quarter
Take the average over the series Criterion 1 says: take the average if the current date is greater than the end of last quarter Criterion 2 says: take the average if the current date is smaller or equal to the end of the current quarter

KEY DATA CONSTRUCTION TECHNIQUES AGGREGATING USING REFERENCING

BKF ANALYST BOOTCAMP

The OFFSET() function is also commonly used for aggregating data, often in combination with SUM() or AVERAGE() functions. The example below shows how this works to create quarterly averages from monthly data. While aggregating data using conditional aggregators is rather intuitive, referencing functions are a little trickier to follow. There are three steps to understand the mechanics of the formula, which related to Figure 10 below. Figure 10: transforming the frequency of a series using the OFFSET() function

First, abstracting from the AVERAGE() function, lets look at the OFFSET() formula in cell E2. This is not shown in the screenshot, but it reads =AVERAGE( OFFSET($B$2,ROW(B2)-ROW($B$2)*3, ,3) )

Lets look at this in steps 1. The formula in this cell sets $B$2 as the starting point. 2. From there it skips ROW(B2)-ROW($B$2)*3=0 lines remember that the function ROW(A?)=? always returns the row number of the selected cell. 3. It skips 0 columns note how the 3rd component of the formula is left blank 4. It selects a box of the height of 3 cells equivalent to the months of Jan, Feb and Mar. 10

KEY DATA CONSTRUCTION TECHNIQUES

BKF ANALYST BOOTCAMP

Second, the formula takes the average within the selected box. Third, to understand why this works for the whole series, lets look at the formula in cell E3, reported on the screenshot. =AVERAGE( OFFSET($B$2,ROW(B3)-ROW($B$2)*3, ,3) )

Now, this formula is doing the following: 1. 2. 3. 4. 5. The formula in this cell sets $B$2 as the starting point. From there it skips ROW(B3)-ROW($B$2)*3 = (3 2)*3 = 3 lines It skips 0 columns It selects a box of the height of 3 cells, equivalent to the month of Apr, May and Jun. It takes the average of the content of the box.

A SHORT DISCUSSION ON METHODS OF AGGREGATION Both methods of aggregations are correct indeed you can try yourself to see that they give the same answer, and both have their pros and cons. Using conditional statements is intuitively simpler; however you will be limited to sums and average as there are no formulas to perform other types of conditional aggregations (median, quartile, variance, standard deviation etc.). Using referencing formulas on the other hand is a little less intuitive as it is less obvious from the way the formula is written what the formula is doing. Referencing formulas however leave more flexibility with respect to the type of aggregation as virtually any operation can be associated with them. One major drawback of the OFFSET() formula is that it only works if the starting point is set correctly on the first month of the quarter, the results otherwise will be wrong. The World bank office in Indonesia and the Australian Treasury have historically used the OFFSET() function as a method of aggregation, including in the spreadsheets handed over to BKF.

11

KEY DATA CONSTRUCTION TECHNIQUES

BKF ANALYST BOOTCAMP

TRANSPOSING DATA
There are two options for turning transposing data from horizontal to vertical and vice versa. The first one implies the use of the OFFSET() function and the second a combination of the INDEX() and MATCH() functions. The two examples below cover each approach. TRANSPOSING WITH THE OFFSET() FUNCTION The example in Figure 11 below shows the formula for turning a vertical series horizontal. As in the previous section, in order to understand the OFFSET() function it is helpful to follow it step by step. 1. The function selects cell $B$2 as a starting point 2. Then, it skips COLUMN(E2)-COLUMN($E$2)=0 rows rememer that COLUMN(?N)=? Reports the number of the column of the selected cell, with colA=1, colB=2 etc 3. Then, it skips 0 columns note how the column entry is blank 4. It selects a box equivalent to the cell $B$2 note that no heigh or leght are entered 5. It reports the value of cell $B$2, from where it has not moved In cell F3, the formula will read: =OFFSET($B$2,COLUMN(F2)-COLUMN($E$2),) What does this do? 1. The function selects cell $B$2 as a starting point 2. Then, it skips COLUMN(F2)-COLUMN($E$2)= 6 5 = 1 row downwards remember that positive numbers make the formula skip down 3. Then, it skips 0 columns 4. It selects a box equivalent to the cell B3 5. It reports the value of cell B3 Figure 11: Using the OFFSET() function to transpose data

12

KEY DATA CONSTRUCTION TECHNIQUES

BKF ANALYST BOOTCAMP

TRANSPOSING WITH THE INDEX( MATCH() ) FUNCTION From earlier we know that the INDEX() function reports values according to their coordinates and the MATCH() function searches for the coordinates of a given value in an array. Figure 12 below shows an example of how to use the two formulas to transpose a series. Lets take a close look at the formula in cell E3 in Figure 12 to see how this works. =INDEX($B$2:$B$100, MATCH(E2,$A$2:$A$100,0) )

To undestand this expression lets look at the MATCH() component first. The MATCH() function in the formula is searching for value E2 that is 1/1/2000 in the array $A$2:$A$100 that is the month column. So the match function is looking for the relative position of the first month. This will be equal to 1. The INDEX() function reports from the array $B$2:$B$100 that is the data column the value of the 1st element, giving us the first month of data. In cell F3, the formula will read =INDEX($B$2:$B$100, MATCH(F2,$A$2:$A$100,0) )

Here, the MATCH function searches for the postion of month 2/1/2000 in the date column, finding it in the second position. The INDEX function then reports the second value of the data column Figure 12: Using the INDEX() function to transpose data

The two transposing methods are equivalent. The only difference to keep in mind is that the OFFSET() function does not work across workbooks unless they are both open at the same time. 13

You might also like