Many-To-Many Relationship Modelling in PowerPivot: Workaround

So far PowerPivot does not support many to many relationships. It is still early days and I am sure that things will change soon. Moreover, there are a few ways to implement many-to-many relationships for measures in DAX. Two notable posts on the subject can be found on Marco Russo’s and Alberto Ferrari’s blogs:

http://sqlblog.com/blogs/marco_russo/archive/2009/12/07/many-to-many-relationships-in-powerpivot.aspx

http://sqlblog.com/blogs/alberto_ferrari/archive/2010/10/19/powerpivot-and-many-to-many-relationships.aspx

Unfortunately, if we have multiple measures, the approach is somewhat cumbersome as we need to build numerous DAX measures. I have been playing around with modelling options and I just came across an idea which, while not perfect, can help us to mitigate to some extent the lack of this crucial bit of functionality. In fact, “not perfect” is quite an understatement – the more appropriate expression would be “a hack”, but it could still be useful in some scenarios – hence this post!

To put some background, let’s assume we have this simplistic model:

An account table

A customer table

And a fact table

Here we have two accounts, A with an Amount of 100 and B with 50.

Now, let’s assume we have the following mapping table between Accounts and Customers:

Account A maps to customers X and Y; Account B to X only.

Now, instead of trying to directly map the relationships here, we add the Amount to the mapping (bridge) table:

And we then create our relationships in PowerPivot. The problem here is that when we slice by Account A (with a key of 1), we would get the 100 Amount doubled. If we had more than two customers mapped to the account, it could have been tripled, or in fact multiplied by N, where N is the number of mapped customers.

The trick I am offering is adding a negative amount to this impromptu fact table against the same account (A) and against a Z customer with a key of -1, which acts as an unknown/adjustment bucket. Therefore, the Customer table would become:

And our fact/bridge table:

Now, if we create relationships between this table, the account and customer tables we would see the following:

And

Well, we do get one extra row with negative amounts, but all else seems just fine. The total is correct and the amounts per account and customer are also correct. If we have more than one measure, we will have to negate each of the additional measures as well if we want to achieve the same behaviour.

I also created a slightly more complicated model, which included years:

Here we have to compensate for two rows with negative amounts but the experience of using this model is not much different than the previous:

Some other common scenarios could also work quite well. I have tried with two M2M relationships and while the number of rows gets progressively larger, the model still seems to work. Another common scenario would be to have more than 2 attributes in either of the tables (whoa!), and that seems to work very well, too. The key is to remember to negate/adjust for each mapping after the first one. If we had a third node in our X,Y table mapped to A, we would have two additional rows in the fact table – one with a positive and one with a negative amount.

Unfortunately I have not come up with a way to simplify the data modelling through DAX, so at this stage I would do it in the source. However, it should not be too difficult to achieve this in SQL, for example.

I would be interested to see if anyone comes up with scenarios where this could not work and cases in DAX where the additional rows are a problem (beyond the “too much data”, of course).

Advertisements

Ad-Hoc Ranges in SSAS

We can easily build ranges in MDX with the range operator “:”. We can also easily create a Range dimension in SSAS and use it to slice our data. This post is not about either of those. I would like to discuss the scenario where we need to restrict an ad-hoc query (e.g. PivotTable in Excel) by a range of values. Usually, we would tell our users to just select the values they need. This works. However, if our users do not want to be selecting/deselecting many values, we can provide an easier way to do this.

Let’s assume we have an Age dimension for a Kindergarten cube. The Age dimension contains the ages of the children, which can be from 1 to 10. Our requirement is to be able to select low and high limits of ages for a Pivot Table in Excel, so that the data in the Pivot Table is sliced for the range of ages between those limits.

To implement this in practise, we can build two extra dimensions – Age – Low Limit and Age – High Limit, which contain the same members as the Age dimension and then use them to slice our cube. Because the data is the same for all three dimensions, we can use a couple of database views on top of the Ade dimension table, thus ensuring that all three are in sync:

After that, we build two bridging tables BridgeLowLimit and BridgeHighLimit between Age and Age – High Limit, as well as between Age – Low Limit:

The data in these Bridging tables maps each Low and High limit to all Age members which are either lower (for High limit) or higher (for Low Limit) than the limit members:

 

Now, we can define many-to-many relationships between the FactAmount (our fact table), through the Age dimension and the Bridging tables to our limit dimensions as follows:

After this, we can hide the two measure groups for the Bridge tables from the users:

Now, we are ready to process our SSAS database. After that, we get the following in Excel:

If we place the Low and High limits in the Report Filter, Age on rows and our Amount measure on columns we can limit the Age members displayed by changing the filter members. Note that only the lowest member in the Age – Low Limit dimension and the highest in the Age – High Limit dimension matter – everything in between those (in case of multi-selects) effectively get ignored.

There are certain problems with this solution. If we place the limit dimensions on rows and we select multiple members from each dimension, we get the following:

This can be confusing for the users if they want to get distinct ranges like 1-3, 3-6, 6-10. Unfortunately, it is not possible to build a cube calculation which hides the irrelevant members as we do not know what the users have selected in Excel. From there, we cannot determine what members are in the query scope, and from there, we can’t pick only the ones we need (e.g. the ones with the lowest distance between the two limit members).

If we place the two dimensions on Rows and Columns, we can get a nice matrix and this makes a bit more sense:

 

Also, for large dimensions (e.g. Date), this can be quite slow, as the number of rows in the Bridge tables will grow. In example, if we have 10 years in our Date dimension, and we map them the way I just showed we will end up with approximately 6-7 million rows in each Bridge table, which can be quite prohibitive from performance point of view. However, for smaller dimensions (in my opinion everything under 1000 members would be ok as it would generate approximately up to 500,000 rows in each Bridge table). Therefore, if our users insist on this functionality – especially when they have a flat list of 100-1000 members, and they frequently select ranges out of this list – we have a way of answering their need.