Custom Groups in Excel 2007 – Error

I just finished digging around a particular issue with Excel 2007 (some versions) and SSAS Pivot Tables. In brief, the issue was that a user could not use the custom groups functionality which Excel provides because she got an error saying:

“The query did not run, or the database table could not be opened. Check the database server or contact your administrator. Make sure the external database is available and hasn’t been moved or reorganized, then try the operation again.”

I added her to the server administrators, but the message persisted. After profiling I noticed that the MDX generated by Excel 2007 for this operation read:

CREATE SESSION CUBE [Cube_XL_GROUPING0] FROM [Cube] ( DIMENSION [Cube].[Agency].[Agency Hierarchy] HIDDEN AS _XL_GROUPING0,DIMENSION [Cube].[Agency].[Flag],DIMENSION [Cube].[Agency].[Region],DIMENSION [Cube].[Collection].[Application],DIMENSION [Cube].[Collection].[Application Code],DIMENSION [Cube].[Collection].[Data Collection Code],DIMENSION [Cube].[Data…

Error: “Parser: The syntax for ‘DIMENSION’ is incorrect.”

I have highlighted the problematic part – the MEASURE part of this expression was missing. A correct MDX statement issued by another instance of Excel 2007 running on a different machine showed:

CREATE SESSION CUBE [Cube_XL_GROUPING1] FROM [Cube] ( MEASURE [Cube].[Value – Data Integer Quarter] HIDDEN,MEASURE [Cube].[Value – Data Integer Semi] HIDDEN,MEASURE [Cube].[Value – Data Integer Annual] HIDDEN,MEASURE [Cube].[Value – Data Integer Month] HIDDEN,MEASURE [Cube].[Value – Data Real Quarter] HIDDEN,MEASURE [Cube].[Value – Data Real Month] HIDDEN,MEASURE [Cube].[Value – Data Real Annual] HIDDEN,MEASURE [Cube].[Value – Data Money Semi] HIDDEN,MEASURE [Cube].[Value – Data Money Month] HIDDEN,MEASURE [Cube].[Value – Data Real Semi] HIDDEN,MEASURE [Cube].[Value – Data Money Quarter] HIDDEN,MEASURE [Cube].[Value – Data Money Annual] HIDDEN,DIMENSION [Cube].[Agency].[Agency Hierarchy] HIDDEN AS _XL_GROUPING0,DIMENSION [Cube].[Agency].[Pub Pte Flag],DIMENSION [Cube].[Agency].[Region],DIMENSION [Cube].[Collection].[Application],DIMENSION [Cube].[Collection].[Application Code],DIMENSION [Cube].[Collection].[Collection Code],DIMENSION [Cube].[Element].[Common Name],DIMENSION [Cube].[Element].[Data Element Code],DIME…
Here we have the cube measures as a part of the CREATE SESSION CUBE statement and this makes it a valid one. The reason for this seems to be the fact that all the physical measures in the cube were hidden and only one calculated measure was shown to the users. Excel (2007 Enterprise) seemed unable to find them, so the fix was easy – creating a visible dummy physical measure and using a scope assignment to make it work like the calculated one. Now Excel merrily creates valid MDX and my user is happy.

I understand this will be a very rare problem, but it takes some time to investigate, so I hope the post may help someone out there.

Data Mystification Techniques

I am a fan of studies in Data Visualisation. It is a creative and dynamic field with a lot of room for experiment. I am considering report and dashboard design, and within this frame Data Visualisation, as a form of practical art. Well designed and built reports are critical for solution adoption and usability. However, in this post I will concentrate on exactly the opposite topic – intentionally mystifying reports, obscuring the data and making it hard, for the report consumers to reach informed conclusions. I will also show how we can make the data visualisations as misleading as possible. This post is not as abstract as one may think, as it draws its examples from a very real project, for which I had to build a report under heavy pressure.

Initially, I was asked to build a report based on a small data set (~450 rows) stored in an Excel workbook. The data was perfectly suitable for building a Pivot Table on top of it, so I did so and then I decided to use the pivot table as a source for my report. The users have one measure – Spending Amount and a few dimensions – Category and Focus for that spending. The request came in the form: “We want to see the Amount by Category and Focus in both graphical and numeric form“. So, I sat down and produced this prototype (with their colour theme):

As you can see from the screenshot, the report is fairly simple – the bar graphs on the top clearly show how the Amount is distributed per Category and Focus. Also, because of an explicit request, I build the bar graph on the second row to show category and focus amounts combined, and in order to clarify the whole picture, I added the table at the bottom of the report. The report works for colour-blind people too, as the details for the Focus per Category Expenditure bar graph are shown directly below in the data table.

I sent the report prototype for review, and the response was:

1. Remove the table.
2. Change the bar graphs to make them more “flat”. Meaning: there is too much difference between the bars.

The reasoning – it is “too obvious” that the data is unevenly distributed. As the report is supposed to be presented to higher level management, discrepancies in the amount allocation would “look bad” on the people who requested me to create the report at the first place.

Adding more fake data got rejected, so I was advised to prepare a new prototype report with the new requirements. It follows:

Now, Stephen Few would spew if presented with such a report. I did everything I could to obscure the report – added 3D effects, presented the amount in 100% stacked graphs when absolutely not necessary and, of course, I had to add a Pie Chart. A 3D Pie Chart. The whole report is totally useless. It is impossible to deduct the actual numbers of the various category/focus amounts. Furthermore, the Pie Chart combines the focus and category and uses virtually indistinguishable colours for the different slices. The 3D effect distorts the proportions of these slices and if printed in Black and White, or if viewed by a colour-blind person, the report looks like this:
Since my goal of total mystification of the report was achieved, I sent the second prototype back.
The response was: “It looks much better, and we really like the top right bar graph and the Pie Chart. Would it be possible to leave just those two and change the Pie Chart to show data broken down just by Category?

So, the users did not like my combined series. A point for them. Then I decided to remove the 3D cone graph, to remove all 3D effects, to make it more readable and to create the following 3rd prototype:

Here, we can see that the pie has the actual percentages displayed, and each category amount can be calculated from there. The stacked bar graph is again almost useless.

The users liked it, but still thought that it was too revealing, and were particularly concerned with the fact that there are a couple of “invisible” categories (the ones with small percentages) on the Pie Chart. They had a new suggestion – change the pie chart back to a bar graph and play with the format, so that even the smallest amount is visible. I offered an option, made another prototype and it finally got approved. The exact words were: “The graphs are exactly what we want“. There it is:

The Y Axis in the first graph is interesting. Not only that there is a “scale break”, but in fact the scale is totally useless, as it is manually adjusted, because of the big gaps between the amounts. I needed two scale breaks, which for a series with 6 values is a bit too much. You can compare the normal linear scale of the first Prototype I created and this one. However, it hit the goal – my users were satisfied.

I consider this effort to be contrary to be an exercise in “Data Mystification”. It is very easy to draw the wrong conclusions from the report, and it achieves the opposite to the goals of Business Intelligence, as instead of empowering users, it could actually confuse them and lead them to making wrong decisions.

Excel and MDX – Report Authoring Tips

Excel is a powerful BI tool which is often overlooked as an inferior alternative to Reporting Services for distributing reports because of its potential for causing chaotic mess in organisations. As a number of speakers on the Australian Tech.Ed 2008 pointed out, replacing Excel is often the goal of many organisations when implementing BI solutions and using Excel as a main reporting tool can be often misunderstood by prospective clients. SharePoint Excel services go a long way in helping organisations control the distribution of Excel files and these in combination with a centralised Data Warehouse and Analysis Services cubes on top of it will no doubt be a competitive solution framework, especially when considering the familiarity of business users with the Microsoft Office suite.

During a recent implementation of a BI solution I was approached with the request to provide certain reports to a small set of business users. As the project budget did not allow for a full Reporting Services solution to be built for them, Excel was appointed as the end-user interface to the OLAP cube. The users were quite impressed by the opportunities that direct OLAP access present to them but were quite unimpressed by the performance they were getting when trying to create complex reports.

After investigating the problem I noticed a pattern – the more dimensions they put on a single axis the slower the reports were generated. Profiling the SSAS server showed that Excel generates quite bad MDX queries. In the case where we put one measure on rows and multiple dimensions on columns the Excel-generated MDX query looks like this:

SELECT {[Measures].[Amount]} ON ROWS,
{NON_EMPTY(

[Time].[Month].ALLMEMBERS*

[Business].[Business Name].ALLMEMBERS*

[Product].[Product Name].ALLMEMBERS)} ON ROWS

FROM OLAP_Cube

What this means is that Excel performs a full cross-join of all dimension members and then applies the NON_EMPTY function on top of this set. If our dimensions are large, this could cause significant issues with the performance of the reports.

Unfortunately it is not possible to replace the query in Excel as it is not exposed to us, and even if we could replace it, it would be pointless as changing the user selections of the dimensions to be displayed would cause it to fail. There are some Excel add-ons  available for changing the query string but issues such as distribution of these add-ons and the inability of business users to edit MDX queries diminish the benefits of using them.

While waiting for an optimised query generator in Excel, we can advise business users of ways to optimise their reports themselves. In my case these were:

  1. Consider using report parameters instead of placing all dimensions on rows or columns in a pivot table. This will place them in the WHERE clause of the query instead of the SELECT clause and will not burden the cross-join part of it.
  2. Spread the dimensions as evenly as possible between rows and columns. Having 6 dimensions on one row is worse than having 3 on rows and 3 on columns as the cross-joins will generate smaller sets for NON_EMPTY to filter and ultimately will improve performance.
  3. Consider using less dimensions – if possible split the reports on multiple sheets. This is not always possible but it is better for the users to keep their report-making simple.

In Reporting Services we can write our own query which can be optimised by “cleaning” the cross-join set of empty members in the following manner:

NONEMPTY(NONEMPTY(DimA * DimB) * DimC))

As we cannot do this in Excel, the best way to improve performance is to advise the users against overcomplicating their reports.