You are on page 1of 5

Improving Query Flexibility with Partition Mapping

BY BRYAN APRIL 19, 2011

Partition Mapping is a traditional database concept that


MicroStrategy has adopted in the SQL Engine to give you even more
options for performance and tuning. Traditional Database
Partitioning involves taking a large table and breaking it up into
smaller pieces by separating it at logical break points, such as one
table per year or per region. The database would then be intelligent
enough to know where the breakpoints are and direct queries to the
specific tables containing the data, providing a smaller set of data to
process.
MicroStrategy uses a similar approach to carving Fact tables into
smaller pieces and offers two different approaches for
implementation. While this feature is intended to provide a little
more efficiency to databases, most advanced platforms wont have
any (or much) performance increases, such as Netezza, Greenplum
or Vertica (though Im no expert on those platforms, its just from my
observations). While this feature may seem antiquated for projects
trying to squeeze the last bit of life out of a stressed MS SQL Server,
you can also use it in a few clever ways to provide query flexibility
and even achieve some amazing data availability.
Before I talk about different ways to take advantage of Partitioning, I
should first describe the feature in general and how to use it. There
are two methods of implementing Partitioning:
Warehouse Partition Mapping Table
The best method for pure performance is to use the Table approach.
Effectively, you create a table of tables by physically breaking up
your Fact table into individual pieces, and then tying the table name

to an Attribute Element in a Control Table so MicroStrategy knows


how to load them.
Example:
You have 10 years worth of data in a table, and its getting a little
large to scan through. Most of your queries are for the last few
years anyway, but you dont want to lose the ability to query history.
You could break that single table up into 10 smaller pieces and
create a control table such as:
Year PBTName
2001 FactData2001
2002 FactData2002
.
..
2011 FactData2011
MicroStrategy will consume this table as a special kind of table,
called a Partition Mapping Table, and treat it as a single table. When
you run a query, MicroStrategy will first determine which table(s)
contain the Year(s) in your report, and only use the appropriate
tables. Aside from the obvious benefits of smaller tables to query,
consider also the benefits to ETL from having smaller Current Year
tables to work with, as opposed to sifting through 9yrs of data that
isnt going to change on a day to day basis.
The obvious drawback of this approach is that you have to maintain
that control table. While it may not seem like a big deal, especially
if you are slicing by an Attribute like Region, which probably doesnt
change very often, there is a requirement not immediately
noticeable. You must provide *separate* tables for every element.
That means if you wanted to do something like FactData2011,
FactData2010 and FactDataArchive (that contains everything prior to
2010), you couldnt use this feature. Even if you define each row in
your Control Table, they cant point to the same physical table in the

database. Perhaps you could be clever and circumvent that


requirement with Views, but even then, if youre slicing on an
Attribute that will have lots of values, such as a Month or even a
Day, youre still in for an unmanageable number of objects. For
something that granular, youd want to turn to the next Partitioning
method
Metadata Partition Mapping
Im not a fan of the name of this method, because whenever I say it,
I feel like the initial interpretation is that it has something to do with
the Metadata. While I guess everything has something to do with
the metadata, this is really a Query Based Partition Mapping,
compared to the Table Based Partition Mapping mentioned
previously.
This approach is a lot more MicroStrategy-esque (to coin a word).
Effectively, youre building a filter that will tell the SQL Engine
which table to choose. Based on the previous example that wasnt
feasible (using 3 tables for FactData2011, FactData2010 and
FactDataArchive), this can be achieved using this method. You
simply define the filter Year=2011 and then choose FactData2011 as
the target table. Repeat the process for Year=2010 to
FactData2010 and then you can define Year<2010 to
FactDataArchive.
The downside of this approach is that before the report can run, a
pre-query must first be run to determine which table to choose. This
can be pretty inefficiency if it has to query the tables themselves,
though you can get it to come from dimensions with some careful
schema tweaking. I dont have those details readily available right
now, but when Ive implemented this in the past, it was a challenge.
Even once you craft pre-SQL that is efficient, debugging reports
becomes incredibly harder all around. Since the SQL Engine must
first run a pre-query to determine which tables to run, you end up

with Schrodingers SQL where every possible outcome is listed in the


SQL View. This not only occurs when trying to view the SQL preexecution, but also in the Job Monitor. This can make the debugging
process very difficult and annoying. Thankfully, once the report is
complete, you can view the SQL and see the actual chosen path.
Partitioning for Real Time Data
Now on to my favorite way to leverage Metadata Partition Mapping.
In my current environment, we load data in near real time out of
necessity (saving it until a single nightly load would take too long to
process), so the Warehouse is within an hour of real time at any
given time during the day. Since users mostly use the BI project for
historical analysis and trending, we dont query the Today data too
often, so we dont include them in Aggregates. Of course, if they do
want to do some real time reporting, usually in the form of
debugging some system issue, traditional Aggregate Table
approaches wouldnt work here. An alternative solution to this
scenario to last weeks Attribute Anchoring approach would be to
use Metadata Partition Mapping to force queries that include Today
to go directly to the FactDetail table, and any queries < Today to
go to the Aggregates. This provides some relief on the ETL for not
having to reprocess Aggregate tables throughout the day, which if
they are at the Day level would require lots of deletes/updates, but
without sacrificing access to the data.
9.2 MultiSource Enhancement
A fantastic enhancement found in 9.2 is the ability to do Metadata
Partitioning across MultiSource connections. This means you could
use separate databases to hold your real time data vs historical
data. In the original example, this could mean that you move
Archived data off of your expensive Warehouse Appliance and off
into a cheaper database solution. Reports wouldnt have to change,
and users wouldnt even notice (other than speed differences).

Another Real Time solution would be to possibly hit Source System


data directly. In most environments, this may not be feasible, but in
my current environment our Source System actually keeps
aggregate level data in absolute real time (within a minute). While
they dont keep history, it is enough that we could use MultiSource
Metadata Partitioning to provide absolute real time data availability
to our users in MicroStrategy.
For a longer list of considerations in defining
your Partitioning strategy, check out this TechNote.
For details on configuring Partitioning, start with this TechNote.

You might also like