SSAS: Multiple SQL Queries in ROLAP Mode

Just recently I was working on a project where I had to build a SSAS ROLAP cube on top of a badly built data mart. Badly built in this case meant one where we encounter multiple referential integrity (RI) issues. Most importantly, the designers ignored the very basic principle that all dimension keys for each row must be present in the respective dimension tables. When in MOLAP mode, SSAS checks for such mismatches during processing. However, when a partition is in ROLAP storage mode, we don’t get a notification that anything is wrong and the cube processing operation succeeds. This situation has some consequences during execution time and I will try to illustrate those in this post and show a solution. Before I begin, I must say that if it wasn’t for Akshai Mirchandani’s (from the Microsoft SSAS dev team) and Greg Galloway‘s help, I would have probably spent quite some time figuring out what is happening. Thanks to them the problem got solved quickly and I got to understand the reason for what is happening.

In terms of set-up, I created two tables in SQL Server: Dim and Fact. The Dim table contained two members A and B, with keys of 1 and 2. Initially, the Fact table had two rows referencing the Dim table – Dim keys of 1 and 2, and a measure column called Amount with 1.0 and 2.0 as the amounts corresponding to A and B. No issues here. After that I created a SSAS solution, corresponding to this simple dimensional model. I switched the partition storage for the cube to ROLAP and processed the SSAS database. After that I ran the following query, which I used for all subsequent examples:

 

 

 

 

 

The result was as expected:

 

 

At the same time I had a SQL Server Profiler trace running, which showed:

 

We can see that SSAS has executed one SQL query retrieving data from the fact table. Nothing unusual thus far.

To spoil the party, I added one more row to the fact table with a dimension key of 3 and Amount of 3. Since I did not add a row in the dimension table with a key of 3, this broke the rules and if I had a foreign key constraint implemented between the fact and the dimension tables I would not have been able to do this. After cleaning the SSAS cache, I ran my query again. The result:

 

 

The actual error was, of course, a missing key. I was not surprised when I saw this on my original project. However, looking at Profiler we see a “weird” sequence of events:

 

SSAS runs multiple queries which result in errors. In this case we can see four of these ExecuteSQL events. All of them are followed by an error in a ReadData event. In this particular case we can see only four ExecuteSQL events. In the real-world, this scenario can get multiple times worse (in my case we saw 4667 queries run against the relational database in a few minutes) leading to a really significant drop in performance.

So, what is happening? According to Akshai, SSAS encounters an error while dealing with the results from the initial SQL query and is trying to recover by sending more queries. In some cases this can result in getting the error in the result set only for some cells.

Luckily, there is an easy way out of this situation (thanks to Greg for providing the tips). SSAS can automatically create an “unknown bucket” for each dimension and can assign to it all measure values which do not correspond to a dimension member. To get this result, we must ensure that each affected partition’s error configuration is set to something similar to:

 

 

 

 

 

 

 

 

 

Note that the KeyErrorAction is ConvertToUnknown, not DiscardRecord (which is the alternative). This must also be coupled with setting up each “incomplete” dimension to include an Unknown member:

 

 

 

 

 

 

 

 

 

 

It does not matter whether the UnknownMember is Visible or Hidden, as long as it is not None.

Back to our scenario. After setting these properties on the dimension and the partition I processed the SSAS database again and executed the query. The result:

 

 

 

and the profiler trace:

 

As we can see we eliminated the multiple queries. If we do not want to see the Unknown amount in the cube we can use a scope assignment:

 

 

Coupled with making the UnknownMember Hidden, we can completely obliterate traces of our underlying RI issues. Unless our users check the numbers, but then we can blame whoever designed the datamart! 🙂

Advertisements

Custom Groups in Excel 2007 – Error

I just finished digging around a particular issue with Excel 2007 (some versions) and SSAS Pivot Tables. In brief, the issue was that a user could not use the custom groups functionality which Excel provides because she got an error saying:

“The query did not run, or the database table could not be opened. Check the database server or contact your administrator. Make sure the external database is available and hasn’t been moved or reorganized, then try the operation again.”

I added her to the server administrators, but the message persisted. After profiling I noticed that the MDX generated by Excel 2007 for this operation read:

CREATE SESSION CUBE [Cube_XL_GROUPING0] FROM [Cube] ( DIMENSION [Cube].[Agency].[Agency Hierarchy] HIDDEN AS _XL_GROUPING0,DIMENSION [Cube].[Agency].[Flag],DIMENSION [Cube].[Agency].[Region],DIMENSION [Cube].[Collection].[Application],DIMENSION [Cube].[Collection].[Application Code],DIMENSION [Cube].[Collection].[Data Collection Code],DIMENSION [Cube].[Data…

Error: “Parser: The syntax for ‘DIMENSION’ is incorrect.”

I have highlighted the problematic part – the MEASURE part of this expression was missing. A correct MDX statement issued by another instance of Excel 2007 running on a different machine showed:

CREATE SESSION CUBE [Cube_XL_GROUPING1] FROM [Cube] ( MEASURE [Cube].[Value – Data Integer Quarter] HIDDEN,MEASURE [Cube].[Value – Data Integer Semi] HIDDEN,MEASURE [Cube].[Value – Data Integer Annual] HIDDEN,MEASURE [Cube].[Value – Data Integer Month] HIDDEN,MEASURE [Cube].[Value – Data Real Quarter] HIDDEN,MEASURE [Cube].[Value – Data Real Month] HIDDEN,MEASURE [Cube].[Value – Data Real Annual] HIDDEN,MEASURE [Cube].[Value – Data Money Semi] HIDDEN,MEASURE [Cube].[Value – Data Money Month] HIDDEN,MEASURE [Cube].[Value – Data Real Semi] HIDDEN,MEASURE [Cube].[Value – Data Money Quarter] HIDDEN,MEASURE [Cube].[Value – Data Money Annual] HIDDEN,DIMENSION [Cube].[Agency].[Agency Hierarchy] HIDDEN AS _XL_GROUPING0,DIMENSION [Cube].[Agency].[Pub Pte Flag],DIMENSION [Cube].[Agency].[Region],DIMENSION [Cube].[Collection].[Application],DIMENSION [Cube].[Collection].[Application Code],DIMENSION [Cube].[Collection].[Collection Code],DIMENSION [Cube].[Element].[Common Name],DIMENSION [Cube].[Element].[Data Element Code],DIME…
Here we have the cube measures as a part of the CREATE SESSION CUBE statement and this makes it a valid one. The reason for this seems to be the fact that all the physical measures in the cube were hidden and only one calculated measure was shown to the users. Excel (2007 Enterprise) seemed unable to find them, so the fix was easy – creating a visible dummy physical measure and using a scope assignment to make it work like the calculated one. Now Excel merrily creates valid MDX and my user is happy.

I understand this will be a very rare problem, but it takes some time to investigate, so I hope the post may help someone out there.