PowerPivot Tips and Tricks – First Findings

I just finished a little project with PowerPivot together with James Beresford, and I feel compelled to share some of the lessons I learned during these exciting if a little bit stressful few days. There are a few things I have heard in the past in regards to good PowerPivot model design, but I’d like to re-iterate some, and maybe add some more to what other bloggers have already mentioned. As it is usual with my posts – the following is not an exhaustive list, but may resonate well with newbie PowerPivot developers like me (which at this stage is probably 99% of Total).

Integer Keys

I found out the importance of having integer dimension keys. The sample model I was given to work with had floats as primary keys for all dimensions. This has always been a pretty bad choice, however, with PowerPivot avoiding this becomes even more important as these get loaded in memory and unless your workstation has heaps of it, keeping the size of the solution smaller is extremely important. Therefore, even though PowerPivot should work just fine with other data types, I would strongly recommend choosing integers for your keys.

Previewing and Filtering

Always have a look at the table you are importing into PowerPivot and make sure you deselect all the columns from your tables which you do not actually need. They can always be selected later on if needed. As it also is with SSAS – starting from a small model and extending it later is much better than starting from getting every possible column in and then removing them. This is even more valid with PowerPivot as you will use up a lot of memory to retrieve everything from the back-end database and unless you really don’t care about memory this is not a good thing.

Co-existence with SQL Server

I know for a fact that PowerPivot and SQL Server can co-exist peacefully. However, since SQL Server was running on the same box, I had to limit its memory usage to 150Mb so it doesn’t cause issues with PowerPivot.

Moving logic to the database

Instead of trying to do more in PowerPivot, if you are designing the solution and/or building the model – make sure that you move join logic, or similar in the back-end database. There is no point in importing tables in PowerPivot when the relationships between them are one-to-one. Instead – you can join them in a view, and then import them as one piece, thus avoiding some relationships in Vertipaq, which will improve performance and will make it easier to work with the model. In example, I had a 1-1 relationship through a large intermediate table (3+mil rows), which I managed to get rid of completely in the database back-end by applying a simple join.

Many To Many Relationships

If you can avoid them – do it. However, if you can’t, you can read Marco Russo’s, Alberto Ferrari’s and this PowerPivot FAQ posts to get more info on how to implement them. Unfortunately, we have to create multiple calculated members in DAX to avoid the problem and if we have many measures, this can become a bit tedious.

Saving Regularly

Yes, you should save your work, especially when you notice that memory gets scarce. Maybe not every 5 minutes, but at least every 30min should be a norm. Since once memory is very low PowerPivot starts having issues with itself (i.e. becomes a bit unstable), and you can easily lose some work. It does take some time to save a large model on disk, but it may be well worth it. This may sound as a no-brainer, but it is easy to stop saving our work because of the time required to do so. Therefore, it is possible (as it was in my case) to get a funny message asking you to Continue or Cancel and you are not too sure what’s the better choice after a couple of hours of work…

Advertisements

Excel and MDX – Report Authoring Tips

Excel is a powerful BI tool which is often overlooked as an inferior alternative to Reporting Services for distributing reports because of its potential for causing chaotic mess in organisations. As a number of speakers on the Australian Tech.Ed 2008 pointed out, replacing Excel is often the goal of many organisations when implementing BI solutions and using Excel as a main reporting tool can be often misunderstood by prospective clients. SharePoint Excel services go a long way in helping organisations control the distribution of Excel files and these in combination with a centralised Data Warehouse and Analysis Services cubes on top of it will no doubt be a competitive solution framework, especially when considering the familiarity of business users with the Microsoft Office suite.

During a recent implementation of a BI solution I was approached with the request to provide certain reports to a small set of business users. As the project budget did not allow for a full Reporting Services solution to be built for them, Excel was appointed as the end-user interface to the OLAP cube. The users were quite impressed by the opportunities that direct OLAP access present to them but were quite unimpressed by the performance they were getting when trying to create complex reports.

After investigating the problem I noticed a pattern – the more dimensions they put on a single axis the slower the reports were generated. Profiling the SSAS server showed that Excel generates quite bad MDX queries. In the case where we put one measure on rows and multiple dimensions on columns the Excel-generated MDX query looks like this:

SELECT {[Measures].[Amount]} ON ROWS,
{NON_EMPTY(

[Time].[Month].ALLMEMBERS*

[Business].[Business Name].ALLMEMBERS*

[Product].[Product Name].ALLMEMBERS)} ON ROWS

FROM OLAP_Cube

What this means is that Excel performs a full cross-join of all dimension members and then applies the NON_EMPTY function on top of this set. If our dimensions are large, this could cause significant issues with the performance of the reports.

Unfortunately it is not possible to replace the query in Excel as it is not exposed to us, and even if we could replace it, it would be pointless as changing the user selections of the dimensions to be displayed would cause it to fail. There are some Excel add-ons  available for changing the query string but issues such as distribution of these add-ons and the inability of business users to edit MDX queries diminish the benefits of using them.

While waiting for an optimised query generator in Excel, we can advise business users of ways to optimise their reports themselves. In my case these were:

  1. Consider using report parameters instead of placing all dimensions on rows or columns in a pivot table. This will place them in the WHERE clause of the query instead of the SELECT clause and will not burden the cross-join part of it.
  2. Spread the dimensions as evenly as possible between rows and columns. Having 6 dimensions on one row is worse than having 3 on rows and 3 on columns as the cross-joins will generate smaller sets for NON_EMPTY to filter and ultimately will improve performance.
  3. Consider using less dimensions – if possible split the reports on multiple sheets. This is not always possible but it is better for the users to keep their report-making simple.

In Reporting Services we can write our own query which can be optimised by “cleaning” the cross-join set of empty members in the following manner:

NONEMPTY(NONEMPTY(DimA * DimB) * DimC))

As we cannot do this in Excel, the best way to improve performance is to advise the users against overcomplicating their reports.