Pre-Aggregated Data and Changing Dimensions

Normally when working with data we assume that we have it on the lowest possible grain and we are challenged by the need to aggregate it along various hierarchies. By the nature of BI, we get to work with large sets of detailed data collected by an existing system.

But what happens when we are given the aggregations and we need to work with these instead? Recently Nick Barclay and I were involved in such an implementation. Nick designed a Health and Safety Dashboard and after that I built it with Reporting Services, PerformancePoint, Analysis Services and SQL Server 2005 relational databases. We were told in the beginning of the project that the aggregation business rules are so complicated that rebuilding them for the purposes of the dashboard was way out of scope. I had not had experience with pre-aggregated data and I did not foresee a major problem, which became apparent after a few months of development when the Business Unit hierarchy changed significantly.

Aggregation Rules

In contrast with typical business scenarios when an SSAS function like Sum and some custom roll-ups works perfectly well, the Health and Safety data needed to be aggregated in a complex way and the organisation we built the dashboard for had already invested in a system managing these aggregations. In example, a simple rule would be – if we have more than 1 major incidents in a Business Unit, it gets an Amber score for Safety Compliance. If it has more than 4, it becomes Red. In turn, its parent is the same – if the sum of all major incidents for its descendants is greater than 1, it becomes Amber and with more than 4 – Red. There were also quite a few dependencies between various dimensions and reluctantly we agreed to work with the data the way it was. The following diagram shows the way an aggregate for Major Incidents works:

Major Incidents Aggregation 

Japan doing badly on Major Incidents also makes Asia and Global look bad.

The problem

The actual problem comes from the fact that our dimension hierarchy can change and if it does the aggregations do not make sense historically. We could have a change in our hierarchy:

path2709

and we may need to generate the following simple report for before and after the change:

majors_incidents_sample

The historical data suggests that something was wrong with China as the only child of Asia before the change in hierarchy, while in fact, the culprit then (and now) was Japan. As we would not be able to see how the hierarchy looked before, we would not be able to analyse the data we have accordingly. Also, we cannot compare data for Asia after the change to data for Asia from before the change along the new hierarchy.

Possible solutions

In my project, the client actually agreed that it is alright to have this handicap in the system and a possible solution was a complete historical rebuild of the aggregations followed by a complete reload of the system data after every hierarchy change. Lucky for us, the data set was not too big and was not expected to grow too much.

Another solution would be to use a Slowly Changing Dimension, so we can show the hierarchy as it was when the aggregations were made – this would improve the analytical value of the data because business users would be able to see why a Business Unit like Asia was Red in the context of an outdated hierarchy.

The best solution would be to build our own aggregations and work with those, as then we are gaining the ultimate flexibility to represent data in the exact way the business users need it, but unfortunately sometimes it is very hard to convince them that spending twice the funds on the same problem is a good thing.

Advertisements

Filtering Unneeded Dimension Members in PerformancePoint Filters

Published on SQLServerCentral on the 2008-11-14:
http://www.sqlservercentral.com/articles/PerformancePoint/64565/
 

Sometimes we need to utilise dynamic dimension security in an Analysis Services solution and we also need to display dynamically the allowed members in a PerformancePoint filter. 

In case our hierarchy is multi-level and in case we can expect to have security on multiple levels, PerformancePoint will display the full path upwards to the root member of the dimension. So, in the case where in a Business Unit hierarchy we have access to some third level members, in PerformancePoint we will see all their parents. In example if our hierarchy looks like this: 

All Business Units
-Europe
–UK
—-France
—-Bulgaria
-North America
—-USA
—-Canada 

and we give someone access to France, they will in fact see: 

All Business Units
-Europe
—-France 

Indeed, when they select All Business Units or Europe they will still see only France data but this may be confusing. To eliminate the top levels we need to change the way we create our PerformancePoint filter. 

To achieve this, first we need to create a MDX Query filter. For more information about PerformancePoint filters and dynamic dimension security you can read the following brilliant post on Nick Barclay’s blog: PPS Data Connection Security with CustomData. Nick explains how to set up PerformancePoint to work with Analysis Services dynamic dimension security and related topics. I will now concentrate on actually filtering the members of the already set-up MDX Query filter. 

Instead of requesting all dimension members with a simple statement like: 

DESCENDANTS([Business].[Business Hierarchy].Members,, SELF_AND_AFTER) 

we can write some fairly simple MDX which means: 

Get me the descendants of all dimension members whose ascendants (excluding themselves) have no more than one child. 

And the MDX code is: 

 DESCENDANTS( FILTER([Business].[Business Hierarchy].Members AS a,   ((FILTER(ASCENDANTS(a.CurrentMember) AS a_asc,     a_asc.CurrentMember.CHILDREN.Count > 1).Count = 1)     And     a.CurrentMember.Children.Count > 1)   Or   ((FILTER(ASCENDANTS(a.CurrentMember) AS a_asc,     a_asc.CurrentMember.CHILDREN.Count > 1).Count = 0)     And     a.CurrentMember.Children.Count = 0)),,SELF_AND_AFTER) 

The result will be France being displayed as the only available member in the PerformancePoint drop-down. Also, if we had two different allowed members, the code above would show us the top common parent of the two. Therefore, if we had France and Canada as the allowed set of dimension members, the drop-down would consist of the following hierarchy: 

All Business Units
-Europe
—-France
-North America
—-Canada 

thus satisfying our requirements.

Excel and MDX – Report Authoring Tips

Excel is a powerful BI tool which is often overlooked as an inferior alternative to Reporting Services for distributing reports because of its potential for causing chaotic mess in organisations. As a number of speakers on the Australian Tech.Ed 2008 pointed out, replacing Excel is often the goal of many organisations when implementing BI solutions and using Excel as a main reporting tool can be often misunderstood by prospective clients. SharePoint Excel services go a long way in helping organisations control the distribution of Excel files and these in combination with a centralised Data Warehouse and Analysis Services cubes on top of it will no doubt be a competitive solution framework, especially when considering the familiarity of business users with the Microsoft Office suite.

During a recent implementation of a BI solution I was approached with the request to provide certain reports to a small set of business users. As the project budget did not allow for a full Reporting Services solution to be built for them, Excel was appointed as the end-user interface to the OLAP cube. The users were quite impressed by the opportunities that direct OLAP access present to them but were quite unimpressed by the performance they were getting when trying to create complex reports.

After investigating the problem I noticed a pattern – the more dimensions they put on a single axis the slower the reports were generated. Profiling the SSAS server showed that Excel generates quite bad MDX queries. In the case where we put one measure on rows and multiple dimensions on columns the Excel-generated MDX query looks like this:

SELECT {[Measures].[Amount]} ON ROWS,
{NON_EMPTY(

[Time].[Month].ALLMEMBERS*

[Business].[Business Name].ALLMEMBERS*

[Product].[Product Name].ALLMEMBERS)} ON ROWS

FROM OLAP_Cube

What this means is that Excel performs a full cross-join of all dimension members and then applies the NON_EMPTY function on top of this set. If our dimensions are large, this could cause significant issues with the performance of the reports.

Unfortunately it is not possible to replace the query in Excel as it is not exposed to us, and even if we could replace it, it would be pointless as changing the user selections of the dimensions to be displayed would cause it to fail. There are some Excel add-ons  available for changing the query string but issues such as distribution of these add-ons and the inability of business users to edit MDX queries diminish the benefits of using them.

While waiting for an optimised query generator in Excel, we can advise business users of ways to optimise their reports themselves. In my case these were:

  1. Consider using report parameters instead of placing all dimensions on rows or columns in a pivot table. This will place them in the WHERE clause of the query instead of the SELECT clause and will not burden the cross-join part of it.
  2. Spread the dimensions as evenly as possible between rows and columns. Having 6 dimensions on one row is worse than having 3 on rows and 3 on columns as the cross-joins will generate smaller sets for NON_EMPTY to filter and ultimately will improve performance.
  3. Consider using less dimensions – if possible split the reports on multiple sheets. This is not always possible but it is better for the users to keep their report-making simple.

In Reporting Services we can write our own query which can be optimised by “cleaning” the cross-join set of empty members in the following manner:

NONEMPTY(NONEMPTY(DimA * DimB) * DimC))

As we cannot do this in Excel, the best way to improve performance is to advise the users against overcomplicating their reports.