Avoiding NULLs in SSAS Measures

In a recent discussion I made a statement that many data marts allow and contain NULLs in their fact table measure columns. From the poll responses I am getting here, I can see that more than half of everyone voting does have NULLs in their measures. I thought the ratio would be smaller because in many models we can avoid NULLs and it could be a best modelling practise to attempt to do so. There are a few techniques which lead to a reduction of the need for NULL-able measures and possibly even to their complete elimination. While converting NULLs to zeros has a performance benefit because of certain optimisations, it also loses it in many cases because calculations operate over a larger set of values; in addition the performance hit of retrieving and rendering zeros in reports and PivotTables may be unnecessary. Therefore, it is advisable to test the performance and usability benefits of Preserving or converting NULLs to BlankOrZero. Thomas Ivarsson has written an interesting article about the NullProcessing measure property on his blog.

I am offering two approaches, which should be always considered when dealing with measure columns containing NULLs.

Adding Extra Dimensions

Let’s assume that we have a fact table with a structure similar to this:

Date        Product    Actual Amount    Forecast Amount
201101      Bikes      100              NULL
201101      Bikes      NULL             105

By adding a Version dimension we can achieve a structure like this:

Date        Product    Version        Amount
201101      Bikes      Actual         100
201101      Bikes      Forecast       105

This way we eliminate the NULL measure values and we get maintainability and arguably a usability boost as well.

Splitting Fact Tables

Another way to improve the data mart model is to ensure that we have more fact tables containing a smaller number of measures. Let’s illustrate this concept with a simple example:

Date        Product    Amount    Exception Amount
201101      Bikes      100       NULL
201102      Bikes      120       10
201103      Bikes      110       NULL
201104      Bikes      130       NULL

In this case, Exception Amount could be an extra cost which is relevant in rare cases. Thus, it may be better moved to a separate table which contains less data and may lead to some space savings on large data volumes. Additionally, it would also allow for the same analytics in SSAS if we implement it as a separate Measure Group. In example, we could rebuild out model like this:

Date        Product    Amount
201101      Bikes      100
201102      Bikes      120
201103      Bikes      110
201104      Bikes      130

Date        Product    Exception Amount
201102      Bikes      10

Even though we have two tables and one extra row in this case, depending on how rare the Exception Amount is, the savings of an extra column may be worth it.

Sometimes it is not possible, or correct to apply these modelling transformations. In example, we could have a valid design:

Date        Product    Sales Amount    Returns Count
201101      Bikes      100             10
201102      Bikes      120             30
201103      Bikes      110             NULL
201104      Bikes      130             20

Here adding Measure Type dimension would help us with eliminating the NULL, but would also mean that we would have to store two different data types – dollar amounts and counts in the same measure, which we should definitely avoid. Also, since we may have a very few months with NULLs in the Returns Count column and the ratio of non-NULLs to NULLs may be low, we would actually lose the benefits of the second approach.

From Analysis Services point of view, it is better to use NULL, again depending on the non-NULL-to-NULL ratio we have. The more NULLs we have, the better it is to not convert the NULLs to zeros as calculations which utilise NONEMPTY will have to work over a larger set of tuples. Also, here we may notice the difference between using NONEMPTY and EXISTS – the second one will not filter out tuples which have associated rows in the fact table even when they are NULL. It also depends on the usage scenarios – if a user filters out non-empty cells in Excel, the Returns Count for 201103 will appear as 0 if we store zeros (because of the way NON EMPTY, which Excel uses, works), or if we have set the measure to convert NULLs to zeros. In the opposite case (when we have NULLs and the NullProcessing property is set to Preserve) Excel will filter out the empty tuples.

Regardless of the scenario, it is always a good idea to try to remove NULLs by creating a good dimensional model. If we are in a situation where NULLs are better left in the measure column of the relational fact table, we should consider whether we need the default zeros in our measures, or if we would be better off avoiding them altogether.

I would be interested to see if and why you do allow NULLs in your measure groups if the reason is not in this article. Also, it would be interesting to see if there are other good ways to redesign a model in order to avoid storing NULL measure values.

Advertisements

2 thoughts on “Avoiding NULLs in SSAS Measures”

  1. Boyan – I certainly take the approach that NULL values, where possible, should not be included in a measure. However a couple of projects I have worked on sees the NULL as a valid response. My dataset is based on Student Results for exams … so for this purpose a student can score a 100% or a 0%, but could also be exempt/deferred from sitting the exam for one reason or another, this introduces the NULL. We need to record who should have sat an exam, and we need to know what their score was, so a fact table with a student_id, date_id, exam_id, grade_id and score(%) is the most logical approach for us.

    Like

Comments are closed.