What Exists and What is Empty in MDX

After reading my chapter “Managing Context in MDX” in MVP Deep Dives vol2 I noticed that I should have probably discussed one extra topic – the difference between cells which cannot exist, and such which can, but are empty.

The basic idea is that in SSAS cubes we have the notion of empty space. However, there is an important difference between empty intersections which are possible but result in nulls when queried and “impossible” intersections between hierarchies in the same dimension.

If we look at the Date dimension in Adventure Works we can see that we have month and year attributes. Months in 2007 appear only with the year 2007 in the dimension. Therefore, the combination between January 2005 and CY 2007 is not possible and consequentially it does not and cannot exist in our cube. In contrast, if we query for Clothing products in 2007 and we place the Month attribute on rows, we can see that there has been no Clothing items sold in the first few months of 2007:

Here we are dealing with possible, but empty cells – the January 2007 to June 2007 rows show empty intersections in the cube.

Why is this important? Well, it means that if we try to get the number of months with Clothing sales in 2007 with a query like:

we wrongly get 12, not 6. We need a function which can “detect” empty cells – not unnecessarily enforce the current context. An excellent function for this purpose is NonEmpty (or Exists):

Here we get the expected number – 6 (since only 6 months in 2007 have Internet Sales Amount against them).

A similar example can be shown the other way around. The following query returns 37, which is the total number of members of the month attribute with data against them:

This is because the NonEmpty function does not enforce the current context on its first argument and it gives us the members with data only (omitting NonEmpty results in 39, because we have two months with no sales whatsoever). Existing does, so if we add Existing to NonEmpty we get the expected count of 12 (as all months have had a sale in 2007 if we take all categories into account):

Here we eliminated the impossible intersections between months not in 2007 and year 2007.

Two New Books Definitely Worth Buying

I will try to provide a concise and useful insight into two books from the world of SQL Server. The first one is available now, while the second one is coming out very soon (in the next couple of weeks).

MDX with Microsoft SQL Server 2008 R2 Analysis Services Cookbook (link)
by Tomislav Piasevoli

 

I received an invite to review Tomislav’s book a few weeks ago and despite the fact that I am quite busy these days, I decided to have a look mainly because I expected it to be very good as I had heard about it months ago by the author and the reviewers (Darren Gosbell, Chris Webb, Greg Galloway, Marco Russo and Deepak Puri); and because I was promised a free copy, so I wouldn’t have to buy the book myself (which I would have done anyway)J. Tomislav has been one of the most prominent SSAS MVPs, quite active on the MSDN Forums and writing interesting posts on his blog. I was not surprised that he has been able to deliver excellent practical advice for his readers in this cookbook and I expect to be using it often during SSAS implementations. Let’s see what you could expect from this book if you buy it.

Firstly, it is a cookbook – Tomislav has written recipes for a large amount of real-world scenarios. I must admit that I did not read absolutely all of them. I went through the ones I think I know well and compared Tomislav’s versions to the ones I would typically use. The topics are very well covered and follow a pattern of: Getting Ready -> How to do it -> How it works -> There’s more -> See also. First, we get an introduction, and then we prepare to get the work done. After that we get a step-by-step walkthrough for the actual solution. I liked the next section “How it works”. Here we get an explanation of why we get the result letting us get some more insight rather than blindly typing code. I find it a very nice touch and I applaud the author for spending the time to include this invaluable bit of information. The “There’s more” section after that expands a bit the topic, trying different options and showing what happens when we apply them. In the end of each chapter we have a section showing which other topics are related to the one we are currently reading. All in all, I think that Tomislav’s cookbook is very, very, well written.

In general, I would say that if you want to learn MDX and how it works, you can start with a textbook-style book, which shows the basics, explains them and then builds up on them. However, if you need to work with MDX now, or if you do have some knowledge but you lack practical skills, then this book is a real gem. I would definitely recommend buying Tomislav’s cookbook, and if you are interested in getting more theoretical detail on why things work the way they do, either do a lot of blog reading, or buying another MDX book to serve as a companion to this one.

In brief, this is the best MDX cookbook out there and offers excellent practical advice over a variety of topics.

MVP Deep Dives vol.2 (link)

 

The second book I would like to introduce is another “must buy”. Imagine the effort required to assemble a book from 53 different authors who live in different parts of the world and are not all native English speakers (like me). Well, Kalen Delaney has made the effort…twice…and this is one of the reasons for having this book published in the next couple of weeks. Another reason is the motivation which the authors found in donating their work to Operation Smile. We are not profiting from it and we hope that you will also buy the book not only because of the outstanding content, but also because the money you spend will be used to help children with facial deformities all around the world.

The list of authors speaks for itself and I am very flattered to be able to participate with a chapter entitled “Managing Context in MDX”. The book will be out by the beginning of PASS Summit 2011 and there will be a book signing session, where you can get your copies signed by some of the authors. Come, drop by if you are there to say hello and have a chat!

MDX Subselects – Some Insight

Recently I answered a question on StackOverflow, which may be an interesting, common case and understanding it correctly helps understanding subselects better.

Let’s examine the following MDX query:

Here we effective place the All member of the Customer.Education hierarchy on columns, and an ordered set of the countries in the Customer dimension on rows. We have a subselect and a slicer. The slicer contains the Internet Order Count measure (which places is it in context) and the slicer restricts the query to look at customers with Partial High School education only.

When we execute the query we get exactly what we expect.

Now, let’s change this to:

The difference here is the All member in the tuple used as a second argument of the Order function. What we get is a different order (note that Germany and France are on different positions on rows). It seems like the set is ordered by the total amount for all customers without taking their education in consideration – the subselect does not make a difference. However, the measure value is the same as in the first query. So, what causes this?

To answer, we need first to understand what a subselect does. As Mosha posted a while ago, a subselect (which is really the same as a subcube) does implicit exists with the sets or set expressions on each axis and also applies visual totals “even within expressions if there are no coordinate overwrites”.

Instead of explaining the same as what Mosha has already spent the effort to explain, I will advise you to read this post thoroughly. Then, you would understand what the “implicit exists” does. In our case, it is not as important as the visual totals part, so I will concentrate on it.

The reason why the set gets ordered by orders made by customers with partial high school education is the visual totals. It takes place within the Order expression. The visual totals also restrict what we see in the query results, as it applies to the slicer axis measure, as well. However, in this case the catch is in the “if there are no coordinate overwrites” part of the story. The visual totals in Order does not get applied because we explicitly overwrite the Customer.Education hierarchy member within the tuple in the Order function:

Therefore, the set of countries gets ordered by the Internet Order Count for All Customers without taking into account the subselect. However, the measure on the slicer axis still gets the visual total logic applied to it, and this causes the result set to be for those customers who have Partial High School education.
If we re-write the second statement to:

We can see that the NON VISUAL keyword (SSMS can’t recognise this syntax and underlines with a red squiggly line) changes the way the measure is displayed. We see the total amount for the slicer, as well.

Similarly, if we re-write the first query with NON VISUAL we get:

Here both visual totals are excluded and both the Order and the result set are done for customers with any education.

Hopefully this will clarify the logic applied when we use subselects to some extent.

Dynamic Groups in SSRS Reports with MDX

Providing a dynamic group in SSRS with SQL is not all that hard. It is also a common business requirement when combining reports. In example, I see very often an old system providing multiple similar reports to its users – that is reports with exactly the same layout but with different items on rows or columns. I also see quite often SSRS requirements asking me to mock the old functionality and build multiple almost identical reports. My personal preference is to combine them in one if possible, thus reducing the maintenance and even development effort and cost. In SQL we have the convenience to choose how to name the output columns. As long as we have a stable dataset – that is a constant number of columns with the same data types called the same way, SSRS does not differentiate between them and treats them the same way. This allows us to flexibly change their actual contents. The standard approach in MDX would be to create all possible “groups” in a large dataset and then use one or another in the report as required through expressions. This, however, leads to slower execution times, larger chunks of data going around the network and the need to aggregate within SSRS at almost every data cell.

Instead, we can build the same functionality by constructing MDX dynamically through the use of parameters. Let’s examine a practical case to illustrate the concept a bit more clearly. If we assume that the Adventure Works executives asked me to create two simple reports showing years across the columns, their Internet Sales Amount in the data area and on one of them product categories on rows, while on the other one – the countries from which the sales originate from. The layout would be the same – a tablix, which has one column group – Year (calendar), Internet Sales Amount in the data cell and one of the two dimension hierarchies (Product Category or Country) on rows. The MDX for the datasets would also be fairly similar:

Year

Country

As we can see, the only difference here is the set on rows. Instead of providing two reports we can combine them in one. The standard approach I see all developers implementing is crossjoining all sets together and then aggregating in SSRS. To switch between the two grouping fields there would be a parameter “Group By”. The sample MDX for this report would be:

And the layout in SSRS:

The row group (and the expression within the row cells) here is based on a GroupBy parameter with two values – “product” or “country”. In SSRS the expression for them looks like this:


When the user selects Product as a value for the Group By parameter, our report renders grouped with the Product categories on rows. In this particular case the dataset is relatively small even when we do a full crossjoin between years, product categories and countries. However, in many cases the number of items on rows can be prohibitive. Therefore, it is better if we can avoid doing this by dynamically constructing the MDX expression. We can add another (Internal) parameter – GroupMDX to our report with the following expression as its default value:

Then, we can replace the report dataset with the following MDX:

To make it execute, I added as a sample query parameter GroupMDX with a value of [Product].[Product Categories].[Category]*[Customer].[Customer Geography].[Australia]. Note that you should use the same hierarchies as sample parameters as what you are actually using in your MDX parameter – otherwise you will get empty results (e.g. if I were to use [Customer].[Country].[Australia] as a sample in this case the actual dataset would not be able to figure out that the hierarchy has been changed and will return an empty set on rows for country).

Unlike with SQL, though, we have a problem – every time the query executes with a different set as a parameter value it returns different result set and its fields collection does not link well to them:

Luckily, there is a workaround. We can add a bit of custom code to our report (Report Properties -> Code). For more details you can read the following MSDN article. In a nutshell, we can check in the custom code for missing fields, thus guarding against the error we are receiving. The slightly modified code snippet from the MSDN page is:

Now, we can wrap every possibly disappearing fields reference within our report with a call to the function. In our case, we can replace the row group expression, as well as the row cell expression with the following expression:

Now we can run our report and get the results we expect. Every time we change the Group By parameter and run the report, SSRS constructs a different MDX query and sends it to SSAS getting only the results it needs. It is a fiddly technique, but in many cases can be a lifesaver – in particular when the large crossjoin method is just too slow.

And the report rdl can be downloaded here: [listyofiles folder=”wp-content/DynamicGroups”]

SSAS SCDs: Showing Active Dimension Members With No Data

Type 2 dimensions play well with SSAS because when we use them SSAS determines which members are relevant to which periods based on the data in the measure group we have. This is always the case – if we slice by a particular date, then all members from a SCD which do not link up to the data in the slice will yield empty values, which we can easily hide in most clients (or by using an MDX function like NON EMPTY). This works well in most cases. However, there is a small problem in a small subset of all usage scenarios – that is when we have no data against a member relevant for the same period. Because it still yields nothing when we go through the data in the cube, it gets hidden, too.

Let’s explore a simple case. If we have a SCD called Employee Type, which has two values: Internal and External for 2010, but three members Internal, Non-Internal and Contractor for 2008 and 2009; and then we have a measure called Employee Count, if we slice by Employee Type we get something like:

Internal          150
External           20
Non-Internal       15
Contractor         10

When we slice by a year we may get something like:

                 2008    2009    2010
Internal           50      50      50
External                           20
Non-Internal               15
Contractor          5       5

Note that if we slice by 2008 and we hide empty cells we would get:

                 2008
Internal           50
Contractor          5

However, what we may want to see on our report may be:

                 2008
Internal           50
Non-Internal
Contractor          5

Un-Hiding empties would actually show:

                 2008
Internal           50
External
Non-Internal
Contractor          5

Well, in SQL we would just use the EffectiveFrom and EffectiveTo dates in the dimension table to determine the correct results. In SSAS we are not that lucky. I would like to show a possible solution.

Firstly, for an SCD we would typically have the dates I just mentioned – EffectiveFrom and EffectiveTo for each row. I would typically exclude them from the dimension in SSAS, but in our specific case we need them. Therefore, we can add them and just hide them instead of excluding them completely. Once we have them set up in this way we could write a bit of MDX, using LinkMember:

WITH
MEMBER [Measures].[ActiveType]
AS
  IIF({{LinkMember([Type].[EffectiveFrom].CurrentMember,
                   [Date].[Date]):
        LinkMember([Type].[EffectiveTo].CurrentMember,
                   [Date].[Date])}*
       [Date].[Year].CurrentMember}.Count = 0,
      NULL,
      1)
SELECT
{
  [Date].[Year].[Year].&[2008]*
  [Measures].[Employee Count]} ON 0,
{
  NONEMPTY([Type].[Type].[Type],
           [Measures].[ActiveType])
} ON 1
FROM [MyCube]

Now we get exactly what we want (Non-Internal shown in 2008 but with no data):

                 2008
Internal           50
Non-Internal
Contractor          5

Please note that for large dimensions this is a very bad approach from performance point of view and should be avoided. Surprisingly enough, users rarely consider performance in their top 10 priorities, while functionality somehow always makes it there, so someone might find this technique useful in extreme user cases.

It would be also interesting if there is another approach to this scenario, which I may be unaware of – I am sure developers have hit (or have been hit) by this problem in the past and there must be other solutions, as well.

Note: If you try to replicate this in Adventure Works as I did, you will find that the keys in the Date dimension(s) and the keys for the Start and End Date in the Type 2 SCDs are not the same. The Date dimension uses integer keys, while the Start and End Date attributes use datetime. Therefore, LinkMember will not be able to match these cross-dimensional attributes and the above approach will not work. As a lesson from this exercise – Effective From and Effective To columns should be of the same data type as your date attribute key in the Date dimension table.

Default Measure in SSAS Cubes

When writing MDX it is always a good idea to know what the context which the query executes in is. If we do not explicitly specify a hierarchy member in a tuple SSAS replaces it with the default one, which way more often than not is the All member for dimensions. Because the All member is the default default member, it is easy to think that the default is always the All. This is untrue – the default can be easily changed through BIDS. A sinister (not really, but the word is cool) consequence of this is very apparent when we consider the Measure dimension.

Measures are in a dimension of sorts. It does not have an All member, so we can say it is non-aggregatable; however it has a default member, which is used whenever we do not explicitly specify a [Measures].[<member>] to be used. The most confusing part is when we omit the Measures member in a function call, and a prime example is the NonEmpty() function call which goes like this:

NonEmpty(<set>,<set>) -> NonEmpty([Customer].[Customer].[Customer], [Date].[Calendar].[CY 2007])

What we get here is a slight problem. NonEmpty still uses a measure – that is the default measure. And if we are not careful, we end up getting incorrect, or even invalid results (e.g. if the default measure is not related to one of the dimensions we may get the set of all members from the first set). However, if we do:

NonEmpty([Customer].[Customer].[Customer], ([Date].[Calendar].[CY 2007], [Measures].[Internet Sales Amount]))

We will get exactly what we want – the function uses the correct, or at least a known measure.

Often we do not know what is the default Measure member. To find it out we can just write this little query:

WITH
MEMBER [Measures].[dMemberName] AS
[Measures].DefaultMember.Member_Name
SELECT
{
[Measures].[dMemberName]
} ON 0
FROM [Adventure Works]

The result is the default Measure member name.

A Guide to Currency Conversions in SSAS

In this post I will try to describe different ways we can do currency conversions in SSAS, which should cover most, if not all, requirements. Hopefully, it will also provide some best practice advice in the process, as well.

As a background and reference, I will use some other blog posts, most prolific of which are Christian Wade’s:

http://consultingblogs.emc.com/christianwade/archive/2006/08/24/currency-conversion-in-analysis-services-2005.aspx

http://consultingblogs.emc.com/christianwade/archive/2006/07/25/Measure-expressions_3A00_-how-performant-are-they_3F00_.aspx

The first one deals with currency conversions and compares the Business Intelligence Wizard approach and Measure Expressions, clearly favouring the Measure Expressions (MEs). The second post explores MEs and their strengths as opposed to MDX calculations. Both posts are very interesting and useful. A slight inaccuracy can be found in the summary section of the first post, which describes the MEs as stored on disk, which is untrue. In fact MEs are not stored on disk and are calculated at runtime. Teo Lachev explains their behavior here:

http://prologika.com/CS/forums/p/835/3064.aspx

And another reference to MEs can be found in the SQL Server 2005 Performance Guide:

http://download.microsoft.com/download/8/5/e/85eea4fa-b3bb-4426-97d0-7f7151b2011c/SSAS2005PerfGuide.doc

Last evidence, and possibly most helpful for me was the confirmation about their behavior I got from Gerhard Brueckl, Philip Stephenson and Darren Gosbell in this MSDN Forum thread:

http://social.msdn.microsoft.com/Forums/en/sqlanalysisservices/thread/61cc5840-f8f1-45b6-9a9b-f9af4b21513e

Darren Gosbell also emailed me with another little clarification, which could have big impact on your solutions – and that is the fact that no aggregations are used for a measure group in SSAS where at least one measure has a ME defined for it. This could be very important in some cases. Teo Lachev has blogged about this limitation here:

http://prologika.com/CS/blogs/blog/archive/2010/05/22/in-search-of-aggregations.aspx

Since we have some background knowledge of how currency conversions can be done, I will continue with a brief discussion of how currency amounts can be stored in a data mart.

In the vast majority of cases, a business works with a “base currency”, which is the default currency used to conform all currency amounts throughout all transactions. When a currency conversion needs to be made, typically we would have to multiply or divide the “base currency amount” by some “currency rate”, which will give us as a result the amount in a non-base currency amount. To implement this approach, we could just follow Christian Wade’s ideas of using Measure Expressions, which would give us the best performance (keeping in mind, of course, the limitations of using Measure Expressions).

Another approach is to store both base currency amount, as well as an amount for all the most commonly used currencies throughout the organisation as physical measures. As a result we end up with a few measure columns corresponding to the different currencies (e.g. USDAmount, EURAmount, AUDAmount). Then we just add these to our measure group and we can build a SCOPE statement, which gives us the correct measure when using our Currency dimension. If we want to convert to a currency other that the ones we have already converted, we need to resort to the previously mentioned approach, accepting one of these currencies as a base currency. Because we work with physical measures in the majority of cases, this implementation solves some problems with performance. However, it suffers from increased storage space requirements, which could (for a very large implementation) be severe. Also, if we have multiple measures we need to convert, we need to store [Number Of Measures] x  [Number of Frequently Used Currencies – 1] more measure columns in our database, and subsequently in our cube. When I am saying “solves some problems with performance”, in fact our problems are solved only when we use the currencies we have the converted amounts for. In all other cases, we are at the worst possible case – complete conversion calculation of our measures.

There is a third tactic, which I have recently been working on. Together with the previous two it could potentially yield best possible results. The idea is to store a base currency amount and a local currency amount in another column, as well as adding a Local Currency dimension to the cube. The Local Currency measure contains the base amount converted to the currency, which is “local” for the data. In example, if we have a company which has offices in Sydney and London, the local amounts stored against the Australian offices (based on business unit or geography) will be in AUD, while the amounts for the English ones will be in GBP. However, the base currency could be AUD, in which case in our BaseAmount column the amounts will always be in AUD. Once we have set this up we can do the following:

  1. For reports in the base currency: Use the base currency amount only
  2. For reports in non-base currency:
    1. Take the LocalAmount from the fact table, where the local currency is the selected currency
    2. Convert the BaseAmount from the fact table, where the local currency is not the selected currency
    3. Sum the amounts we get from the previous two steps

In my current solution I have the following cube (showing only the Dimension Usage tab, which I believe illustrates the structure best):

 

Note that I have a Currency Conversion measure group, which stores conversion rates from base currency for each date. It has a many-to-many relationship with the other two measure groups in my cube, while these two measure groups have a Local Currency dimension related to them as described above.

Then the MDX in the cube script for my Deal measure (the physical measure is called [Amount – Trade]) is:

/* This calculation will pick the Local Currency Amount from the fact table with no currency
     conversions applied */
CREATE MEMBER CURRENTCUBE.[Measures].[Local Trade Currency Amount]
AS
    (LinkMember([Currency].[Currency Code].CurrentMember,
                [Local Currency].[Currency Code]),
    [Currency].[Currency Code].[AUD],
    [Measures].[Local Trade Amount]),
ASSOCIATED_MEASURE_GROUP = ‘Deal’,
VISIBLE = 0;

/* Here we pick all other amounts in Base Currency (other than the one selected by the user)
     and then we pull the converted amounts for them (calculation done in a Measure Expression)*/
CREATE MEMBER CURRENTCUBE.[Measures].[Converted Local Trade Currency Amount]
AS
    SUM([Local Currency].[Currency Code].[Currency Code] –
            LinkMember([Currency].[Currency Code].CurrentMember,
                        [Local Currency].[Currency Code]),
        [Measures].[Trade Amount]),
ASSOCIATED_MEASURE_GROUP = ‘Deal’,
VISIBLE = 0;

/* In this combined measure we combine the previous two to retrieve the overall amount
     in the selected currency */
CREATE MEMBER CURRENTCUBE.[Measures].[Amount – Trade]
AS
    SUM({[Measures].[Local Trade Currency Amount],
         [Measures].[Converted Local Trade Currency Amount]}),
    FORMAT_STRING = “Currency”,
    ASSOCIATED_MEASURE_GROUP = ‘Deal’,
    VISIBLE = 1;

When we generate a report in the base currency, the performance is best-possible as no conversions are made (if we utilize the Direct Slice property as described in Christian Wade’s blog posts above). The same holds true for reports in a non-base currency, where 100% of the data can be retrieved from the LocalAmount measure. The worst-case comes when we request a currency, which is not a Local Currency in any sub-slice to the one requested by our query. In this case, we resort to the first approach, where we convert every amount and then sum it up. If some of the data can be retrieved as a LocalAmount, and some of the data cannot, we are in between the best and worst cases and performance will depend on the amount of conversions which need to be performed.

I consider this a useful solution as in many cases reports will either be specific to an office/country and are generated in a local currency only; or are “global” and are using only the base currency for the organisation. Therefore, using this approach, we get best possible performance with no severe trade-off in storage space.

In the end, it depends on the specific organisation and requirements, but we can always combine the second and the third solutions, storing a few converted amounts as physical measures, including a LocalAmount as another one, and converting to a different to these currency with Measure Expressions only when absolutely necessary. This way, we essentially take a “best of both worlds” approach and can obtain best possible results from our solution.