You are on page 1of 31

DB301- SSAS Implementation

Best Practices

Nicholas Dritsas
SQL Server Product Group
Customer Advisory Team
Microsoft Corporation
Microsoft Confidential
Provide practical knowledge that you can use
Improve performance of Analysis Services
solutions
Query performance
Processing performance
Learn what MDX syntax to use and which to avoid
Key takeaway 1: Analysis Services 2005 can
handle very large data volumes
Key takeaway 2: AS 2005 has a big learning
curve even if you were good at AS 2000.

Microsoft Confidential
An end-to-end integrated offering

Business
Monitoring, Analytics, Advanced
Performance Scorecarding
Planning Analytics
(Business
Management (PerformancePoint (ProClarity 6.2)
Scorecard
Applications Server 2007)
Manager 2005)

Collaboration and Content


(Office SharePoint Server 2007)

End-user Analysis
(Excel 2007)

Integration Analysis Reporting


Integration Services Analysis Services Reporting Services

BI
Platform
SQL Server 2005
RDBMS

Microsoft Confidential
Who is SQL Customer Advisory Team (SQL CAT)
Overview of big AS projects
Lessons Learned
Project Team skills needed
Tools
Improving Processing
Improving Queries
Configuration Changes
Design features to be careful of
Scale up vs. Scale Out

Microsoft Confidential
Works with largest, most complex SQL Server
projects worldwide
US: NASDAQ, USDA, Verizon, Raymond James…
Europe: LSE, Barclay’s Capital
APAC: NUL, KT, Western Digital, JR East

Drives enterprise requirements back into SQL


Server
Shares best practices with SQL Server community
SQL CAT Blog: http://blogs.msdn.com/sqlcat
SQL ISV PM Blog: http://blogs.msdn.com/mssqlisv/
You will see us at PASS & TechED
Microsoft Confidential
On TechNet
Get the real-world guidelines, expert tips, and rock-solid
guidance to take your SQL Server implementation to the
next level.
http://www.microsoft.com/technet/prodtechnol/sql/bestpractic
e/default.mspx
Contents
Technical Whitepapers
ToolBox
Top 10 Lists
Ask a Question

Microsoft Confidential
Hilton Hotels
Forecasting and budgeting application for their 2,400 +
hotels. Real time OLAP, scale out AS and RS.
Edcon (South Africa)
Sales reporting and analysis for multiple business.
Multiple terabyte relational warehouse, scale up AS, use
of RS and ProClarity.
Enterprise Rent-A-Car
Rental and inventory tracking and analysis. Teradata
backend with ROLAP AS, RS, and ProClarity.
Danske Supermarket
2TB AS DB, 500GB largest cube size, 800 users, 10
concurrent queries
MPS
Complex design with linked cubes and parent child
dimensions
40M Member customer dimension
Microsoft Confidential
MDX Expert, MDX Expert, MDX Expert
MDX Syntax, Profiler analysis (subcube usage etc.)
Analysis Services
AS2000 expertise != AS2005 expertise

SQL Server
Business Domain Expert
The usual infrastructure guru with knowledge
of:
64 bit
NUMA
SAN / DISKS
Performance Monitor Tools and analysis

Microsoft Confidential
Usage based optimizer
Result is aggregations based on your workload
ASCMD.exe (http://msdn2.microsoft.com/en-us/library/ms365187.aspx)
Command line execution tool for batch processing
Edit XML project file to add custom aggregations
Custom Aggregation Tool
Coming as a sample in SP2
Browse aggregations
Create custom aggregations
Stress testing tools
Based on Visual Studio for Testers (Okracoke)
Web tests for Reporting Services
Query tests for MDX

Microsoft Confidential
Who is SQL Customer Advisory Team (SQL CAT)
Overview of big AS projects
Lessons Learned
Project Team skills needed
Tools
Improving Processing
Improving Queries
Configuration Changes
Design features to be careful of
Scale up vs. Scale Out

Microsoft Confidential
Performance Challenges
Long Processing times
Moving from AS 2000 to 2005
Long running queries
Cube designs that cause problems
Hardware configurations that cause problems
Improperly configured NUMA systems
SAN setup

Microsoft Confidential
SQL Server Performance Tuning
Improve the queries that are used for extracting data from SQL
Server
Check for proper plans and indexing
Conduct regular SQL performance tuning process
AS Processing Improvements
Use SP2 !!
Processing 20 partitions: SP1 1:56, SP2: 1:06
Don’t let UI default for parallel processing
Go into advanced processing tab and change it
Quick demo on where to change
Monitor the values:
Maximum number of datasource connections
MaxParallel – How many partitions processed in parallel, don’t let the
server decide on its own.
Use INT for keys, if possible.
Microsoft Confidential
For best performance use ASCMD.EXE and XMLA
Use <Parallel> </Parallel> to group processing tasks
together until Server is using maximum resources
Proper use of <Transaction> </Transaction>
ProcessFact and ProcessIndex separately instead
of ProcessFull. Different CPU usage pattern.
ProcessClearIndexes deletes existing indexes and
ProcessIndexes generates or reprocesses existing
ones.

Microsoft Confidential
Change from default value of
<MemoryHeapType>1</MemoryHeapType>
<HeapTypeForObjects>1</HeapTypeForObjects>
to
<MemoryHeapType>2</MemoryHeapType>
<HeapTypeForObjects>0</HeapTypeForObjects>
This change benefits cases with many concurrent queries.
Use PreAllocate config parameter for NUMA servers
Allocations are distributed across all NUMA nodes. Start with 10
and monitor and test. Use dbmon.exe to check allocated memory.
Duration for processing 8 partitions in parallel :
15 min with PreAllocate 15, 25min with PreAllocate 40
Pre-SP2 it is best to turn NUMA off

Microsoft Confidential
Most common causes of performance problems
Hierarchies are not natural
Most common cause of performance problems
Attribute relationships are critical
Check all hierarchy definitions
Aggregations
Capture a workload then use the Usage Based Optimization
Increase the % in the aggregation design wizard
Edit the XML to add custom aggregations or use the new
Custom Aggregation Tool (with SP2)
MDX
Read Nicholas Dritsas blog on http://blog.msdn.com/sqlcat
Make sure you have MDX expert on project

Microsoft Confidential
Design Tips
Dimension Design
Get the Hierarchy design right
Measure Group Design
Don’t put all measures in one measure group. Use
multiple measure groups for more efficient cache
usage
Use right size of data type
Calculated Members
Optimize according to following MDX tips
Some can be done using SCOPE in MDX
Use NON_EMPTY_BEHAVIOR whenever possible

Microsoft Confidential
Parent-Child Dimensions
Many-to-Many
Linked measure groups on remote servers
Linked dimensions
Referenced dimensions
May have to materialize if it is a performance
problem
Remote Partitions
Limited testing, very little real-world feedback
Real Time ROLAP

Microsoft Confidential
MDX Tips (1 of n)
Avoid assigning values like 0, Null, “N/A”, “-“ to
cells that would remain empty otherwise.
Avoid redundant Sum/Aggregate calculations in
situations where default/normal cell value
aggregation would do.
Try to avoid IIF.
Prefer using static literal hierarchy and member
references
Use Measures.Sales instead of Dimensions(0).Sales,
Avoid using LinkMember, StrToSet,
StrToMember, StrToValue.

Microsoft Confidential
MDX Tips (2 of n)
Use Non_Empty_Behavior optimization hint instead of writing
calculation expressions of the form
Aggregate(NonEmptyCrossjoin(Descendants(…, Leaves) …).
If wanting to get the value of the current cell, consider using an
explicit measure name instead of Measures.CurrentMember.
When writing calculation expressions like “expr1 * expr2”,
make sure the expression sweeping the largest area/volume in
the cube space is on the left side.
Replace simple “Measure1 + Measure2” calculations with
computed columns in the DSV or in the SQL data source.
Instead of writing expressions like
Sum(Customer.City.Members,
Customer.Population.MemberValue), consider defining a
separate measure group on the City table, with a Sum
measure on the Population column.

Microsoft Confidential
Filter a set and then use it in the
Crossjoin. Filter function materializes the set
and iterates through the set to check condition
to build new set.
Avoid:
filter(NECJ({set1},{set2}),..)
Use:
NECJ(filter({set1},...),{set2})"

Microsoft Confidential
The disadvantage of using Intersect() to
determine if a member exists in a set is
because it treats the member as a set and can
not use a better plan in the evaluator.
Avoid:
iif(intersect({ACTUALS_DAYS_SET},{[TIME
DIM].[Time Main].currentmember}).count)>0
Use:
iif(rank([TIME DIM].[Time Main].currentmember,
{ACTUALS_DAYS_SET})>0

Microsoft Confidential
The formula engine can generate a better query plan if MDX
does not use “.CurrentMember”.
CurrentMember is implied and does not need to be explicitly
included in the syntax.
No Need to use [TIME
DIM].[TimeMain].[Year].currentmember
in the following MDX.
WITH MEMBER [Measures].[M] as
'([TIME DIM].[Time Main].[Year].currentmember
,[FINANCIAL DIM].[Financial].[Financial Category].&[ACTL])'
select {[TIME DIM].[Time Main].[Year].&[2005].members}
* {descendants([GROUP EVENT DIM].[Group
Event].[Hotel].&[12]
,[GROUP EVENT DIM].[Group Event].[Group Event])}

Microsoft Confidential
Use Exists Function
Exists function should be used where ever possible
instead of filter on member properties.
Use Minus over Filter for a single member
When filtering out a single member from the set use
minus over filter function
Avoid:
filter({set},.Currentmember <> "UNKN")
Use :
( {set} minus {&[UNKN] member})

Microsoft Confidential
Although there may not be any difference for a simple example, when combined with
other calculations, it can cause a more complicated execution plan inside the server.
Note that when using parallelperiod function, often the function evaluates to a
constant, so if it is known in advance that is preferred because the engine does not
check for certain patterns that are known to be constant.
It can be faster for a UI tool to send a first query to resolve parallelperiod (without
other calculations), then substitute into the original query, rather than sending one
more complicated query.

Avoid:
with member [a].[NiceName] as '[a].[123]'
member [Time].[YearBefore] as 'parallelperiod( [Time].[year],
1, [Time].[2006].[jan] )'
select { [a].[NiceName] } on 0,
{ [Time].[2006].[jan], [Time].[YearBefore] } on 1
from [MyCube]

• Use:
select { [a].[123] } on 0,{ [Time].[2006].[jan], [Time].[2005].[jan] } on 1
from [MyCube]
Microsoft Confidential
Usually empty cell is checked for avoiding division by zero or for
checking if value is missing (NON EMPTY analysis).
Empty cells are treated as zero in arithmetic operations.
For checking empty cells (Non Empty Analysis)
Filter([dimension].[hierarchy].member.members,
isEmpty(dim.member))
This invokes MDX function IsEmpty, which checks whether cell
value is empty or not. Note, that if the cell value is empty, it is
treated as number zero in arithmetic operations, however, it is
possible, that b had value zero, which is not empty! Therefore check
for IsEmpty is appropriate when the user wants to differentiate
empty or missing values from the existing values (for example in
NON EMPTY like analysis), but is not appropriate for checks in
division by zero.
Note: Do not ever use IS operator (ie: IIF(b IS NULL, NULL, a/b)) to
check if the cell value is empty. The IS operator checks if the member
b exists or not.

Microsoft Confidential
Create Cache allows you to cache certain data in
order to speed up queries
New in SP2
The Create Cache statement is written manually
and the basic syntax is to crossjoin all of the
members specified in the queries
Calculated Members must be added by including
the base members that are used in the calculation
The typical process is to run Create Cache and
then the user query

Microsoft Confidential
CREATE CACHE
FOR [Cube] AS
(
{ [USA].[Washington], [USA].[Oregon] }
* { [2006].Children, parallelperiod([Time].[Year],
1, [2006].[Q1].[Jan] ), YTD(parallelperiod(
[Time].[Year], 1, [2006].[Q1].[Jan])) }
* { [Measures].[Sales] }
)

Microsoft Confidential
Scale up is usually the first attempt
SCALE UP Tips
Add more memory and use the proper settings
More CPUs can help parallelize processing and
complex queries
Partition your cubes effectively (see Partitioning Tips
slide)
Large projects are quickly reaching the limits of
single machines
Finding that scale out for AS is the only solution

Microsoft Confidential
Can create multiple instances of AS on the same
server
Not recommended because it is difficult to manage
Need to learn to use the Windows Resource Manager
Better to scale out to multiple smaller servers
Separate Processing servers and Query servers
Multiple Query Servers to Load Balance
Use Syncronize function in AS
Or Copy directories using Robocopy (or similar) for best
performace
Fully test linked dimensions and linked measure
groups
Absolutely avoid linking across servers
Linking to different database on same server can also be
slow Microsoft Confidential
Partition Sizing Tips
How many partitions should you have?
Very few large partitions are not so good because
queries can take a while to search for your data
Too many partitions and it takes a while to start AS, to
open BI Development Studio, etc.
In general more smaller partitions are better until you
get over 2,000 (dependent on your hardware)
More CPUs and more IO threads will help with high
number of partitions.
Partitioning Tips (dependent on hardware)
Maximum partition size: 2GB
Number of partitions: < 2,000

Microsoft Confidential
References
BLOGS: http://blogs.msdn.com/sqlcat
PROJECT REAL-Business Intelligence in
Practice
Analysis Services Performance Guide
TechNet: Analysis Services for IT
Professionals

Microsoft Confidential

You might also like