SSAS: Multiple SQL Queries in ROLAP Mode

Just recently I was working on a project where I had to build a SSAS ROLAP cube on top of a badly built data mart. Badly built in this case meant one where we encounter multiple referential integrity (RI) issues. Most importantly, the designers ignored the very basic principle that all dimension keys for each row must be present in the respective dimension tables. When in MOLAP mode, SSAS checks for such mismatches during processing. However, when a partition is in ROLAP storage mode, we don’t get a notification that anything is wrong and the cube processing operation succeeds. This situation has some consequences during execution time and I will try to illustrate those in this post and show a solution. Before I begin, I must say that if it wasn’t for Akshai Mirchandani’s (from the Microsoft SSAS dev team) and Greg Galloway‘s help, I would have probably spent quite some time figuring out what is happening. Thanks to them the problem got solved quickly and I got to understand the reason for what is happening.

In terms of set-up, I created two tables in SQL Server: Dim and Fact. The Dim table contained two members A and B, with keys of 1 and 2. Initially, the Fact table had two rows referencing the Dim table – Dim keys of 1 and 2, and a measure column called Amount with 1.0 and 2.0 as the amounts corresponding to A and B. No issues here. After that I created a SSAS solution, corresponding to this simple dimensional model. I switched the partition storage for the cube to ROLAP and processed the SSAS database. After that I ran the following query, which I used for all subsequent examples:

 

 

 

 

 

The result was as expected:

 

 

At the same time I had a SQL Server Profiler trace running, which showed:

 

We can see that SSAS has executed one SQL query retrieving data from the fact table. Nothing unusual thus far.

To spoil the party, I added one more row to the fact table with a dimension key of 3 and Amount of 3. Since I did not add a row in the dimension table with a key of 3, this broke the rules and if I had a foreign key constraint implemented between the fact and the dimension tables I would not have been able to do this. After cleaning the SSAS cache, I ran my query again. The result:

 

 

The actual error was, of course, a missing key. I was not surprised when I saw this on my original project. However, looking at Profiler we see a “weird” sequence of events:

 

SSAS runs multiple queries which result in errors. In this case we can see four of these ExecuteSQL events. All of them are followed by an error in a ReadData event. In this particular case we can see only four ExecuteSQL events. In the real-world, this scenario can get multiple times worse (in my case we saw 4667 queries run against the relational database in a few minutes) leading to a really significant drop in performance.

So, what is happening? According to Akshai, SSAS encounters an error while dealing with the results from the initial SQL query and is trying to recover by sending more queries. In some cases this can result in getting the error in the result set only for some cells.

Luckily, there is an easy way out of this situation (thanks to Greg for providing the tips). SSAS can automatically create an “unknown bucket” for each dimension and can assign to it all measure values which do not correspond to a dimension member. To get this result, we must ensure that each affected partition’s error configuration is set to something similar to:

 

 

 

 

 

 

 

 

 

Note that the KeyErrorAction is ConvertToUnknown, not DiscardRecord (which is the alternative). This must also be coupled with setting up each “incomplete” dimension to include an Unknown member:

 

 

 

 

 

 

 

 

 

 

It does not matter whether the UnknownMember is Visible or Hidden, as long as it is not None.

Back to our scenario. After setting these properties on the dimension and the partition I processed the SSAS database again and executed the query. The result:

 

 

 

and the profiler trace:

 

As we can see we eliminated the multiple queries. If we do not want to see the Unknown amount in the cube we can use a scope assignment:

 

 

Coupled with making the UnknownMember Hidden, we can completely obliterate traces of our underlying RI issues. Unless our users check the numbers, but then we can blame whoever designed the datamart! 🙂

Melbourne SQL Server Social Event: Short Notice!

A short notice for SQL Server enthusiasts/professionals in Melbourne. Stephen Few is in town and will attend the SQL Server Social Event tomorrow (17 Nov) at the Sherlock Holmes Inn in the CBD (415 Collins Street). Feel free to come and meet the community, talk about SQL Server and information visualisation..and possibly get your Stephen Few books signed.

Link for the event: http://sqlserversocial.eventbrite.com/

See you there!

Alternate Ordering of Attributes in SSAS

Sometimes we need to display attribute members in SSAS in a different order than the order of its name or key. For this purpose we have the option to use one of its attribute’s name or key. However, in some cases changing the order may break some calculation logic which depends on the initial order. The new ordering may also be inconvenient for writing MDX as using some functions of the language is easier (at least conceptually) when thinking of sets in ascending order. The best example which we can use to illustrate this problem is the Date dimension. While in most, if not all, cases the Date dimension is ordered in ascending order, sometimes users prefer to see the most recent date first and request us to change the order to descending. Doing so invalidates many time intelligence calculations like rolling and parallel periods, etc. Furthermore, fixing those requires inverting numbers to negative, or avoiding the use of functions like ClosingPeriod. All in all, a “small” change can lead to a big problem. We can, however, accommodate our ignorant users (which unknowingly get the benefit of reading default time series charts backwards – from right to left – when dragging-dropping descending dates in Excel, for example) without changing too much in our scripts. A little trick in the modelling can help and it is the reason for writing this post.

Let’s have a look at a simple Date dimension with one attribute – Date. Nothing unusual, with the Date being ordered by its Key (integer in this case) and with a name coming from another column in the Date table – DateName. When we create a simple slice in Excel we get the following:

 

 

 

 

 

 

 

 

Now we create a measure Rolling 3 Days Amount, which sums the last 3 days’ amount:

 

 

 

 

 

 

 

 

The MDX for this calculation is:

 

 

 

 

If we simply invert the order of the Date attribute by ordering it by another column in our Date table, which contains DateKey*-1 and refresh the Excel pivot table we get the following:

 

 

 

 

 

 

 

 

This is simply incorrect. A relatively small change in the MDX script can help us with this issue (e.g. changing the Lag to Lead), however in many cases we do not want to rebuild all the measures. Luckily, we can employ a different tactic. Instead of changing the script, we can change the structure of our dimension by adding an additional attribute which is not exposed to the users. (i.e. is hidden). This attribute will be based on the same column we use for our Date, but will not be ordered by the descending column. We can rename the original attribute (the one exposed to the users) to something like Date Desc, or a more user-friendly option, and hide the new one:

        

 

 

 

Everything else stays the same – our cube script does not need to be adjusted and its logic is correct:

 

 

 

 

 

 

 

 

A different approach could be to leave the old attribute named Date, so there is no change necessary in case of reports depending on the naming. This, however, requires a change of the cube script, which can be easily performed with using the BIDS Replace functionality (e.g. Ctrl+H).

Note that for this approach to work we need to make sure that the attribute exposed to the users is the dimension key attribute as changing its current member results in an (infamous) attribute overwrite where its related attributes, which are above it in the relationship chain) also change. If we expose the non-key date attribute our MDX logic will break as the changes to its current member will not affect the attributes below it (actually, it will set them to their All member).