July 2011 – Boyan Penev on Microsoft BI

SSAS Myths Dispelled

This post is an attempt to dispel a few myths which seems to get repeated over and over among SSAS developers. While the truths are nothing new and have been documented in multiple sources like BOL, SQL CAT whitepapers, books and blog posts, they seem to consistently escape the attention of the wider public.

1 SSAS pre-aggregates data by default

While it is true that SSAS can pre-aggregate data this does not happen by default. SSAS compresses data, indexes data and caches data but it does not pre-aggregate data unless we define aggregations. When I am saying pre-aggregate I mean that SSAS does not automatically know what the Internet Sales Amount for Australia in 2007 is. It needs to get the leaf-level data and sum it up; unless we have built an aggregation on Country and Year for the partition containing the Internet Sales Amount measure. In that case the data is pre-aggregated and ready for retrieval.

2 We can emulate in MDX at no cost Enterprise Edition functionality in Standard Edition

Well, we can’t. We can emulate certain features like LastNonEmpty aggregation functions for example, but it comes at a cost. The cost usually relates to Storage Engine (multi-threaded) vs Formula Engine (single-threaded) execution.

3 SSAS is always faster than SQL Server RDBMS

While it is true that SSAS is faster than SQL Server RDBMS in many cases, this does not always hold true. A particular area in which the relational engine beats SSAS is the retrieval and processing of low-level granular data. SSAS usually beats the RDBMS when it comes to ad-hoc access to aggregated data.

4 MOLAP is always faster than ROLAP

If you read SQL CAT’s “Analysis Services ROLAP for SQL Server Data Warehouses” whitepaper you can see that after careful tuning ROLAP can be faster than MOLAP. Not always, but sometimes – enough to claim that it is not true that MOLAP is always faster than ROLAP. This ties a bit to the previous myth and proves that a well tuned RDBMS can perform very well with aggregates.

From the paper:

“At last, SQLCAT’s redesign and optimization efforts paid off. The ROLAP cube was finally ready for performance testing, and thanks to the amazingly fast performance of the relational SQL Server engine on top of a super-fast storage subsystem, the results looked better than expected. To everybody’s surprise, the ROLAP cube outpaced the MOLAP cube in 45 percent of all queries right from the start (see Figure 14). Only 39 percent of the queries showed substantially slower response times in ROLAP mode (more than twice the amount of MOLAP time) and 16 percent showed moderate performance degradation (less than twice the amount of MOLAP time).”

5 Usage Based Optimisations do not work well

In SQL Server Analysis Services 2008 the Usage Based Optimisation (UBO) algorithm has been redesigned. Now it works, and it works well. It does not create redundant aggregations and in general performs much better. Building UBO aggregations has always been recommended by Microsoft and even more so now.

6 Rigid attribute relationships boost performance

Whether an attribute relationship is Rigid or Flexible does not actually improve performance at all. Not query performance. A wrong choice here only affects processing of partition indexes. If an attribute relationship is static, setting it to Rigid means that you do not have to process partition indexes when you update the dimension. This is all the benefit you get from Rigid relationships. Going too far and marking changing relationships to Rigid may have a very negative impact as a change will prompt a complete process of the partition data and indexes, which will take much longer than updating just the indexes. Either way, there is no difference during query execution.

7 MDX and DAX are hard

I believe that this particular myth stems from the fact that we get to compare MDX and DAX to sweet and fluffy languages like SQL and C#. It all depends on the vantage point. Take the following “Hello world!” program in Malbolge for comparison purposes:

(‘&%:9]!~}|z2Vxwv-,POqponl$Hjig%eB@@>}=<M:9wv6WsU2T|nm-,jcL(I&%$#”
`CB]V?Tx<uVtT`Rpo3NlF.Jh++FdbCBA@?]!~|4XzyTT43Qsqq(Lnmkj”Fhg${z@>

MDX is not all that bad from a Malbolge developer’s point of view, is it?

Custom Groups in Excel 2007 – Error

I just finished digging around a particular issue with Excel 2007 (some versions) and SSAS Pivot Tables. In brief, the issue was that a user could not use the custom groups functionality which Excel provides because she got an error saying:

“The query did not run, or the database table could not be opened. Check the database server or contact your administrator. Make sure the external database is available and hasn’t been moved or reorganized, then try the operation again.”

I added her to the server administrators, but the message persisted. After profiling I noticed that the MDX generated by Excel 2007 for this operation read:

CREATE SESSION CUBE [Cube_XL_GROUPING0] FROM [Cube] ( DIMENSION [Cube].[Agency].[Agency Hierarchy] HIDDEN AS _XL_GROUPING0,DIMENSION [Cube].[Agency].[Flag],DIMENSION [Cube].[Agency].[Region],DIMENSION [Cube].[Collection].[Application],DIMENSION [Cube].[Collection].[Application Code],DIMENSION [Cube].[Collection].[Data Collection Code],DIMENSION [Cube].[Data…

Error: “Parser: The syntax for ‘DIMENSION’ is incorrect.”

I have highlighted the problematic part – the MEASURE part of this expression was missing. A correct MDX statement issued by another instance of Excel 2007 running on a different machine showed:

CREATE SESSION CUBE [Cube_XL_GROUPING1] FROM [Cube] ( MEASURE [Cube].[Value – Data Integer Quarter] HIDDEN,MEASURE [Cube].[Value – Data Integer Semi] HIDDEN,MEASURE [Cube].[Value – Data Integer Annual] HIDDEN,MEASURE [Cube].[Value – Data Integer Month] HIDDEN,MEASURE [Cube].[Value – Data Real Quarter] HIDDEN,MEASURE [Cube].[Value – Data Real Month] HIDDEN,MEASURE [Cube].[Value – Data Real Annual] HIDDEN,MEASURE [Cube].[Value – Data Money Semi] HIDDEN,MEASURE [Cube].[Value – Data Money Month] HIDDEN,MEASURE [Cube].[Value – Data Real Semi] HIDDEN,MEASURE [Cube].[Value – Data Money Quarter] HIDDEN,MEASURE [Cube].[Value – Data Money Annual] HIDDEN,DIMENSION [Cube].[Agency].[Agency Hierarchy] HIDDEN AS _XL_GROUPING0,DIMENSION [Cube].[Agency].[Pub Pte Flag],DIMENSION [Cube].[Agency].[Region],DIMENSION [Cube].[Collection].[Application],DIMENSION [Cube].[Collection].[Application Code],DIMENSION [Cube].[Collection].[Collection Code],DIMENSION [Cube].[Element].[Common Name],DIMENSION [Cube].[Element].[Data Element Code],DIME…

Here we have the cube measures as a part of the CREATE SESSION CUBE statement and this makes it a valid one. The reason for this seems to be the fact that all the physical measures in the cube were hidden and only one calculated measure was shown to the users. Excel (2007 Enterprise) seemed unable to find them, so the fix was easy – creating a visible dummy physical measure and using a scope assignment to make it work like the calculated one. Now Excel merrily creates valid MDX and my user is happy.

I understand this will be a very rare problem, but it takes some time to investigate, so I hope the post may help someone out there.

MDX Subselects – Some Insight

Recently I answered a question on StackOverflow, which may be an interesting, common case and understanding it correctly helps understanding subselects better.

Let’s examine the following MDX query:

Here we effective place the All member of the Customer.Education hierarchy on columns, and an ordered set of the countries in the Customer dimension on rows. We have a subselect and a slicer. The slicer contains the Internet Order Count measure (which places is it in context) and the slicer restricts the query to look at customers with Partial High School education only.

When we execute the query we get exactly what we expect.

Now, let’s change this to:

The difference here is the All member in the tuple used as a second argument of the Order function. What we get is a different order (note that Germany and France are on different positions on rows). It seems like the set is ordered by the total amount for all customers without taking their education in consideration – the subselect does not make a difference. However, the measure value is the same as in the first query. So, what causes this?

To answer, we need first to understand what a subselect does. As Mosha posted a while ago, a subselect (which is really the same as a subcube) does implicit exists with the sets or set expressions on each axis and also applies visual totals “even within expressions if there are no coordinate overwrites”.

Instead of explaining the same as what Mosha has already spent the effort to explain, I will advise you to read this post thoroughly. Then, you would understand what the “implicit exists” does. In our case, it is not as important as the visual totals part, so I will concentrate on it.

The reason why the set gets ordered by orders made by customers with partial high school education is the visual totals. It takes place within the Order expression. The visual totals also restrict what we see in the query results, as it applies to the slicer axis measure, as well. However, in this case the catch is in the “if there are no coordinate overwrites” part of the story. The visual totals in Order does not get applied because we explicitly overwrite the Customer.Education hierarchy member within the tuple in the Order function:

Therefore, the set of countries gets ordered by the Internet Order Count for All Customers without taking into account the subselect. However, the measure on the slicer axis still gets the visual total logic applied to it, and this causes the result set to be for those customers who have Partial High School education.
If we re-write the second statement to:

We can see that the NON VISUAL keyword (SSMS can’t recognise this syntax and underlines with a red squiggly line) changes the way the measure is displayed. We see the total amount for the slicer, as well.

Similarly, if we re-write the first query with NON VISUAL we get:

Here both visual totals are excluded and both the Order and the result set are done for customers with any education.

Hopefully this will clarify the logic applied when we use subselects to some extent.