I just finished a little project with PowerPivot together with James Beresford, and I feel compelled to share some of the lessons I learned during these exciting if a little bit stressful few days. There are a few things I have heard in the past in regards to good PowerPivot model design, but I’d like to re-iterate some, and maybe add some more to what other bloggers have already mentioned. As it is usual with my posts – the following is not an exhaustive list, but may resonate well with newbie PowerPivot developers like me (which at this stage is probably 99% of Total).
I found out the importance of having integer dimension keys. The sample model I was given to work with had floats as primary keys for all dimensions. This has always been a pretty bad choice, however, with PowerPivot avoiding this becomes even more important as these get loaded in memory and unless your workstation has heaps of it, keeping the size of the solution smaller is extremely important. Therefore, even though PowerPivot should work just fine with other data types, I would strongly recommend choosing integers for your keys.
Previewing and Filtering
Always have a look at the table you are importing into PowerPivot and make sure you deselect all the columns from your tables which you do not actually need. They can always be selected later on if needed. As it also is with SSAS – starting from a small model and extending it later is much better than starting from getting every possible column in and then removing them. This is even more valid with PowerPivot as you will use up a lot of memory to retrieve everything from the back-end database and unless you really don’t care about memory this is not a good thing.
Co-existence with SQL Server
I know for a fact that PowerPivot and SQL Server can co-exist peacefully. However, since SQL Server was running on the same box, I had to limit its memory usage to 150Mb so it doesn’t cause issues with PowerPivot.
Moving logic to the database
Instead of trying to do more in PowerPivot, if you are designing the solution and/or building the model – make sure that you move join logic, or similar in the back-end database. There is no point in importing tables in PowerPivot when the relationships between them are one-to-one. Instead – you can join them in a view, and then import them as one piece, thus avoiding some relationships in Vertipaq, which will improve performance and will make it easier to work with the model. In example, I had a 1-1 relationship through a large intermediate table (3+mil rows), which I managed to get rid of completely in the database back-end by applying a simple join.
Many To Many Relationships
If you can avoid them – do it. However, if you can’t, you can read Marco Russo’s, Alberto Ferrari’s and this PowerPivot FAQ posts to get more info on how to implement them. Unfortunately, we have to create multiple calculated members in DAX to avoid the problem and if we have many measures, this can become a bit tedious.
Yes, you should save your work, especially when you notice that memory gets scarce. Maybe not every 5 minutes, but at least every 30min should be a norm. Since once memory is very low PowerPivot starts having issues with itself (i.e. becomes a bit unstable), and you can easily lose some work. It does take some time to save a large model on disk, but it may be well worth it. This may sound as a no-brainer, but it is easy to stop saving our work because of the time required to do so. Therefore, it is possible (as it was in my case) to get a funny message asking you to Continue or Cancel and you are not too sure what’s the better choice after a couple of hours of work…