MDX ORDER Riddle

I just found this question on MSDN forums:

I wonder if anyone has run into this before.

I’m trying to produce a report in which all rows are sorted by a specific
column. For example the following works fine:

WITH
SET [RowSet] AS [Product].[Subcategory].Levels(1).Members
SET [ColumnSet] AS [Date].[Calendar].Levels(1).Members
Select
Order([RowSet], [ColumnSet].Item(0), BDESC) ON ROWS,
[ColumnSet] ON COLUMNS
FROM [Adventure Works]

All product sub-categories are sorted by CY 2001.

However when the rows and columns of the report contain related non-leaf
attributes of the same hierarchy then the sort does nothing, as illustrated
in this example:

WITH
SET [RowSet] AS [Product].[Subcategory].Levels(1).Members
SET [ColumnSet] AS [Product].[Category].Levels(1).Members
Select
Order([RowSet], [ColumnSet].Item(0), BDESC) ON ROWS,
[ColumnSet] ON COLUMNS
FROM [Adventure Works]

Is such sort possible at all, may be using different MDX functions, or is
this a known limitation for related attributes?

Interestingly enough it works when the leaf-level attribute
([Product].[Product]) is mixed with a non-leaf level attribute (e.g.
[Product].[Category]).

If you think about it for a while, you’ll see that it is not easy to comprehend what is happening here at a first glance. Some light is shed on it by Deepak Puri on MSDN forums – please mark his answer as useful if you think it is:

http://social.msdn.microsoft.com/Forums/en-US/sqlanalysisservices/thread/cde6f082-6dbd-4368-a20a-e64ea21de2bc

You could also write:

 WITH
SET [RowSet] AS [Product].[Subcategory].Levels(1).Members
SET [ColumnSet] AS [Product].[Category].Levels(1).Members
Select
Order([Product].[Subcategory].Levels(1).Members AS a, ([ColumnSet].Item(0), a.CurrentMember), BDESC) ON ROWS,
[ColumnSet] ON COLUMNS
FROM [Adventure Works]

So, the ordering tuple is defaulting to the All member of the Subcategories – therefore, we are trying to order the RowSet by the same value. If we do that on a non-directly related attribute, the ordering expression is in fact getting evaluated within the current RowSet member..which seems natural. So all in all, it looks like a peculiarity in the ORDER function – most likely because it would make some sense in terms of performance. Please let me know if you know why it happens.

Passing unCONSTRAINED Set and Member parameters between reports in Reporting Services

By default SSRS MDX queries get a StrToMember or StrToSet functions with a CONSTRAINED flag. However, many developers do not quite know why it is there or what it actually does. Books-On-Line contains this statements:

StrToMember

  • When the CONSTRAINED flag is used, the member name must be directly resolvable to a qualified or unqualified member name. This flag is used to reduce the risk of injection attacks via the specified string. If a string is provided that is not directly resolvable to a qualified or unqualified member name, the following error appears: “The restrictions imposed by the CONSTRAINED flag in the STRTOMEMBER function were violated.”
  • When the CONSTRAINED flag is not used, the specified member can resolve either directly to a member name or can resolve to an MDX expression that resolves to a name.
  • StrToSet

  • When the CONSTRAINED flag is used, the set specification must contain qualified or unqualified member names or a set of tuples containing qualified or unqualified member names enclosed by braces {}. This flag is used to reduce the risk of injection attacks via the specified string. If a string is provided that is not directly resolvable to qualified or unqualified member names, the following error appears: “The restrictions imposed by the CONSTRAINED flag in the STRTOSET function were violated.”
  • When the CONSTRAINED flag is not used, the specified set specification can resolve to a valid Multidimensional Expressions (MDX) expression that returns a set.
  • Therefore, if you have a CONSTRAINED flag you have to specify exact members or sets (e.g. [Date].[Year].[2009], or {[Date].[Year].[2009],[Date].[Year].[2010]}). If you omit the flag, you can pass to the StrToMember an expression, which evaluates to a member (e.g. [Date].[Year].[Year].Members.Item(0)), and to StrToSet an expression, which evaluates to a set (e.g. NONEMPTY([Date].[Year].[Year].Members, [Measures].[Amount]).

    The flexibility which removing CONSTRAINED offers can be quite powerful when passing parameters between reports. In example, we may want to pass a parameter to a drill-through report from two different summary reports, where each of those uses a different subset of dimension members, which in turn can be derived by different set expressions.

    Major drawbacks of using this approach is the severe performance hit it leads to, as well as a possible “MDX injection” vulnerability. Since in most cases we would  be using the passed parameters in a subcube expression or on the slicer axis (WHERE clause), this should not lead to as bad performance as we would get if we would use it inside a calculation. However, when we need to use a parameter directly in a calculated measure, we would be better off avoiding an unCONSTRAINED function. 

     Therefore, we may instead use SetToStr on the summary reports and pass a String parameter to a CONSTRAINED StrToSet function in the drill-through report. This way we are first resolving the set expression once and passing it on to the underlying report as a string. We could do that in a calculated measure returning a String, then passed on as a Field to the drill-through parameter. However, in the rare case where we have many rows travelling from the SSAS server to the SSRS server, this could be slow.

    So, whether we use a StrToSet without CONSTRAINED flag, or a String parameter constructed by a SetToStr function, is dependent on the actual scenario, but it is good to have both options in our arsenal of tools and techniques when we need to implement some non-quite-standard piece of functionality.

    When Not To Write MDX and When Not To Use Analysis Services

    MDX is a great way to achieve some objectives easily and efficiently. However, there are some things better done in other ways. I will first discuss three distinct mistakes, which designers and developers tend to make when working on a typical BI solution.

    1. Leaf-Level Calculations

    This is by far the most common one. Both on the MSDN SQL Server forums, and in practice – developers try building calculations in MDX on the leaf level of some dimensions and usually hit severe performance problems. While it is usually possible to build an MDX expression to achieve the goal, it is usually much simpler and way better for performance to just do the calculation either in the ETL, or in the DSV (either as a Named Calculation, or as a part of a Named Query). This avoids the need for the query engine to perform a large number of calculations every time we request an aggregate.

    2. Mocking Joins

    I have discussed this in a previous post, where I am explaining how we can access a measure group through a dimension, which is not directly related to it (but rather related to it through another measure group and another dimension). Well, instead of doing this, we can simply build a many-to-many relationship between the two and avoid the whole MDX bit.

    3. Business Logic over a large dimension

    MDX is brilliant for implementing business logic. Unless it needs to operate over millions of dimension members every time a calculation is being requested. In example, recently I tried building a bit of business logic, which needed to order a dimension over a measure, and get the member with a largest value for each member of another dimension with hundreds of thousands of members. On top of it there were other calculations doing similar logic and the end result was not quite what was expected. Even though the MDX was really neat and achieved the purpose in 3-4 lines, I moved the logic back to the ETL (which was quite a bit more complex) because of performance. So, in general, I would not advise in favour of using MDX when to retrieve the result, the query engine needs to go through a lot of cells (in my case quite a few million), especially when ordering is required.

    A more fundamental mistake is using Analysis Services in a solution that does not really need it. Two severe and common, in my opinion mistakes are:

    1. Data Dumps

    Why build a cube when the sole purpose of the project is to allow users to access the underlying data? Unfortunately, sometimes Analysis Services is seen as a silver bullet for everything. If the end report contains a massive amount of data and a key requirement is for it to export nicely to CSV, why bother – just export the data to CSV, zip it up and let the users download it. As far as I know, this can be achieved very easily in a number of other ways. Especially considering the amount of effort and skills needed to build and maintain a SSAS solution.

    2. No Aggregations

    Another way SSAS gets misused is when a lot of textual data gets stored in a large number of big dimensions, and those get linked in a “fact table”. I have previously worked on a solution where there were no measure columns in the fact table at all and the cube was used to retrieve information about dimension members of the largest dimension called “Member”, containing 4-5 million customers. The rest were dimensions like “Sign Up Date”, “Date Suspended”, “Country of Birth”, “Age Band”, etc. In the end, the main report consisted of the information about the members. No data was aggregated apart from a simple count. The entire OLAP solution could have been replaced by a SQL query with a WHERE clause and an index.

    I am sure that there are many other cases when SSAS gets misused. A solution utilising SSAS and MDX properly can be very powerful and elegant. However, sometimes because of poor design choices it gets discarded as inadequate. Don’t blame the tools and the technology if your cube is slow – it is most likely a problem with either your design or the way you have developed your solution.

    All Member Properties – Name, Key and Level

    I just tried to find some more information about the All Member in SSAS dimension hierarchies and since it was not readily available, I had to experiment a bit, so I thought I may as well share my findings. For some these may be obvious, but for some they could as well be interesting.

    So in brief, I will explore the Name, Key and Level of an All member in a dimension hierarchy. The one of choice was the Customer dimension and Customer Geography hierarchy in Adventure Works. There is an All member, called All Customers. As expected, .PROPERTIES(“MEMBER_NAME”) gives us “All Customers”:

    WITH
    MEMBER [Measures].[test] AS
      [Customer].[Customer Geography].CurrentMember.PROPERTIES(“MEMBER_NAME”)
    SELECT
    {
      [Measures].[test]
    } ON 0,
    {
      [Customer].[Customer Geography].Members
    } ON 1
    FROM [Adventure Works]

    The first row shows us: All Customers.

    Now, let’s see what its key is:

    WITH
    MEMBER [Measures].[test] AS
      [Customer].[Customer Geography].CurrentMember.PROPERTIES(“KEY”)
    SELECT
    {
      [Measures].[test]
    } ON 0,
    {
      [Customer].[Customer Geography].Members
    } ON 1
    FROM [Adventure Works]

    This gives us 0.

    And its level:

    WITH
    MEMBER [Measures].[test] AS
      [Customer].[Customer Geography].CurrentMember.Level.Name
    SELECT
    {
      [Measures].[test]
    } ON 0,
    {
      [Customer].[Customer Geography].Members
    } ON 1
    FROM [Adventure Works]

    The result this time is: (All).

    So far so good. Now let’s try using these to get only the All member:

    SELECT
    {
      [Customer].[Customer Geography].[All Customers]
    } ON 0
    FROM [Adventure Works]

    This works. Now if we try the Key:

    SELECT
    {
      [Customer].[Customer Geography].&[0]
    } ON 0
    FROM [Adventure Works]

    Interestingly, since the All member is a calculated member and has no physical key, if we try to use the one that SSAS gave us does not actually work – we get nothing on Axis 0.

    Using the level works:

    SELECT
    {
      [Customer].[Customer Geography].[(All)].Item(0)
    } ON 0
    FROM [Adventure Works]

    Also, after experimenting a bit further:

    SELECT
    {
      [Customer].[Customer Geography].[All]
    } ON 0
    FROM [Adventure Works]

    This query also works even though the All member name is [All Customers], not just [All]. However, Analysis Services does recognise [All].

    In summary, the most robust options for referencing the All member in a dimension hierarchy that I have found are:

    1. [Dimension].[Hierarchy].[(All)].Item(0)
    2. [Dimension].[Hierarchy].[All]

    These will always work – regardless of the dimension and hierarchy names.

    Another option is using [Dimension].[Hierarchy].[] – e.g. [Customer].[Customer Hierarchy].[All Customers]

    And, one that does not work – referencing through its alleged key: [Customer].[Customer Hierarchy].&[0]

    Please let me know if there are any better alternatives, or why it would give me a key of 0 for the All member and would not work when actually using this key.

    Average Aggregation in Analysis Services

    In SSAS we do not have a measure Average aggregation type. We do have AverageOfChildren (or Average over time), however it is semi-additive and works only along a Time dimension. Fortunately, we have Sum and Count, and since Average = Sum / Count, we can build our own Average aggregation when we need one. 

    To do that: 

    1. Create a measure using the Sum aggregation type (which is also the default). In our example, let’s call it Amount.
    2. Create a Count of Non-Empty Values (or Count of Rows) measure. In example – [Measure Count].
    3. Create the actual calculation – [Measures].[Amount]/[Measures].[Measure Count]

    We can either create a calculated measure, which performs the MDX calculation above: 

    CREATE MEMBER CURRENTCUBE.[Measures].[Average Amount]
    AS
    [Measures].[Amount]/[Measures].[Measure Count]
    ,NON_EMPTY_BEHAVIOR = {[Measures].[Measure Count]}
    ,VISIBLE=1; 

    , or if we really do not need the Sum base measure, we can set it to be replaced by the calculation with a SCOPE statement: 

    SCOPE([Measures].[Amount]);
      This = [Measures].[Amount]/[Measures].[Measure Count];
    NON_EMPTY_BEHAVIOR(This) = [Measures].[Measure Count];
    END SCOPE; 

    Voila! We have created a measure simulating an Average aggregation type. Then, we can hide the Count helper measure and from user point of view there is no evidence of our effort. 

    Since the count will never be 0, we do not have to say “If not 0, divide, else – do not” and the NON_EMPTY_BEHAVIOR query hint may in fact improve performance, since the calculation will not be performed when the Count measure is NULL (instead of resulting in NULL/NULL=NULL). 

    Mosha has previously blogged about NON_EMPTY_BEHAVIOR and division by zero and I strongly recommend reading his post. 

    Another important consideration, which depends on the business scenario is the type of the Count aggregate. It could be Count of rows (Row Bound) or Count of non-empty values (Column Bound). The difference is whether we want to include or exclude the empty values from our aggregate. Either way, the described technique will work equally well. 

    I realise that this is a well-known approach, but since it is hard to find the solution online I thought it may be interesting for some less-experienced developers.

    Filtering measures by indirectly related dimensions in MDX

    I have lately started visiting the SQL Server MSDN Forums and trying to answer some questions about Analysis Services and Reporting Services. One of the questions about MDX queries seems to get repeated quite often and I will try to address it in this post, so hopefully more people will get to read this rather than ask about it on MSDN.

    The actual question takes the form of:
    “I have Dimension A and Dimension B, related to Measure 1. Dimension B is also related to Measure 2. How can I (is it possible to) get the values for Measure 1 filtered/sliced by Dimension A. I know it is easy to achieve with a join in SQL, but I do not know how to do it with MDX.

    This suggest the following dimension model:

    One solution would be creating a many-to-many relationship between Dimension A and Measure Group 2. However, we may want to avoid that for some reason and answer the problem with a query.

    We can achieve the desired result in a number of ways but I will discuss the one using NONEMPTY. Others would be using FILTER and EXISTS.

    A sample script is:

    SELECT
    {
    [Measures].[Measure 2]
    } ON 0,
    NON EMPTY
    {
    NONEMPTY( [Dimension B].[Dimension B Hierarchy].Members,
    ([Measures].[Measure 1], [Dimension A].[Dimension A Hierarchy].&[Member_Key]))
    } ON 1
    FROM [Cube]

    What this script does:

    1. Gets all Dimension B members, which have associated cells for Measure 1 and the specific Dimension A member (which we are filtering/slicing by)
    2. Gets the Measure 2 cells for the set of members retrieved in Step 1
    3. Removes members from Step 1, for which cells from Step 2 are empty

    An AdventureWorks example is:

    SELECT
    {
    [Measures].[Internet Order Count]
    } ON 0,
    NON EMPTY
    {
    NONEMPTY( [Product].[Product].Members,
    ([Measures].[Reseller Order Count], [Reseller].[Reseller].&[238]))
    } ON 1
    FROM [Adventure Works]

    Problems with FORMAT_STRING, VBA functions and NON_EMPTY_BEHAVIOR

    Consider the following requirement: “We want our measure to be truncated to 4 decimal places without any trailing 0s after the decimal point and a comma as a thousands separator.”

    First, let’s focus on the truncation part. If we want that to happen for a measure, we can do it through the following formula, as I have described previously:

    SCOPE([Measures].[Our Measure]);
    This = Fix([Measures].[Our Measure]*10^4)/10^4;
    END SCOPE;

    This takes care of our truncation. So far so good.

    Now let’s have a look at the formatting. If we want to apply custom formatting through FORMAT_STRING for a number such as 12345.1234, which states: “#,0.####”, in order to obtain 12,345.1234 we run into a problem. The same FORMAT_STRING expression applied to 12345 gives us 12,345. – including a trailing decimal point (in our case a trailing period). There is no way to get rid of it through FORMAT_STRING. Even in the specifications for FORMAT_STRING it is pointed out that:

    If the format expression contains only number sign (#) characters to the left of the period (.), numbers smaller than 1 start with a decimal separator.

    This holds true for #s to the right of the period, as well.

    What we can do in this case is either conditionally format the number, with some sort of a rule, which checks if the number is whole or not (I will avoid that), or we can use the VBA Format function like this:

    SCOPE([Measures].[Our Measure]);
    Format([Measures].[Our Measure], “#,0.####”);
    END SCOPE;

    This yields the correct result, so in the end, we can do:

    SCOPE([Measures].[Our Measure]);
    This = Format(Fix([Measures].[Our Measure]*10^4)/10^4, “#,0.####”);
    END SCOPE;

    That may be achieveing the result, but let’s look at perfromance of a calculated measure utilising this approach. The first thing I noticed when I implemented this is the huge increase in query processing time and the large number of 0 valued cells. It turned out that the VBA functions in MDX do not skip empty (NULL) cells. If you try Fix(NULL), you’ll get 0. So, after Fix-ing our measure, we get 0s for every empty cell in our cube. The same is valid for Format.

    Next step was trying to find a way to skip these empties. I tried:

    SCOPE([Measures].[Our Measure]);
    This = Format(Fix([Measures].[Our Measure]*10^4)/10^4, “#,0.####”);
    NON_EMPTY_BEHAVIOR(This) = [Measures].[Our Measure];
    END SCOPE;

    but it did not work. I still got the 0s in my result set. I suppose that it got ignored by SSAS. Because of this issue I decided to write an IIF statement like this:

    SCOPE([Measures].[Our Measure]);
    This = IIF([Measures].[Our Measure] = 0, NULL, Format(Fix([Measures].[Our Measure]*10^4)/10^4, “#,0.####”));
    END SCOPE;

    This also worked. However, now we have an IIF statement, which serves no purpose other than filtering our empty cells, because of the VBA functions’ behavior. It would be interesting if there is any other way of implementing this, avoiding the problem. It would be nice if:

    1. VBA functions can skip empty cells
    2. FORMAT_STRING behaves just like Format
    3. NON_EMPTY_BEHAVIOR actually works with SCOPE (in queries)

    Please comment if you have a solution for the problems above and let me know if you would like to see the above issues fixed (I am considering raising a feedback/recommendation issue on Connect).

    Custom Rounding and Truncation of Numbers in MDX

    Article published in SQL Server Central on 2009/03/26

    In some scenarios we need to be able to round or truncate decimals to achieve correct calculation results. In SQL we have ROUND, which can do either of these. It rounds like we are used to – 0.5 rounds up to 1, can round up or down and we rarely get a project where as a part of the requirements we are implementing our own rounding or truncation algorithm.

    However, in MDX we have Round() which performs a “strange” operation – bankers’ rounding, which our business users have usually not been exposed to, and if we decide to truncate to an integer number through casting with Int or cInt, we also get some strange results. To illustrate the problem with MDX please consider the value of these expressions:

    Round(2.15, 1) = 2.2
    Round(2.25, 1) = 2.2
    Round(2.35, 1) = 2.4
    Round(2.45, 1) = 2.4

    Int(1.9) = 1
    Int(-1.9) = -2
    cInt(1.9) = 1
    cInt(-1.9)= -2

    These are usually considered wrong, because they are not the obvious results. Even though they are mathematically well founded, if we round 2.25 to 2.2, our users will come back at us with wrong numbers on their reports. Same goes for “trimming” -1.9 to -2.

    To resolve the first problem with rounding, we can use our own math formula:

    Fix([Measures].[???] * Factor + 0.5 * Sgn([Measures].[???])) / Factor

    Where Factor is the rounding factor – 1 for 0 decimal places, 10 for 1 and so on (defined by 1/Factor). Of course, Factor of 0 will give us Div by 0 error. (Reference: http://support.microsoft.com/kb/196652)

    If we have the Excel function libraries intalled on our server, we can also simply use Excel!Round() as Chris Webb advises.

    As for the Int and cInt in MDX, we can use the Fix() VBA function to remove decimal places:

    Fix(1.9) = 1
    Fix(-1.9) = -1

    Also, for truncation of values to a certain decimal point in MDX, we can use the following formula:

    Fix(<value>*10^1)/10^1

    All we need to adjust in order to change the decimal places is to replace 10^1 with another power of 10 – in example, to truncate 3.156 to 3.15 we can use: Fix(3.156*10^2)/10^2. To make things simpler, in all our formulas the power of 10 is what determines how many decimal paces we need to round to; negative powers will give us rounding to tens, thousands and so on. If we use ^0 we will round to whole numbers.

    Using these we can avoid bankers’ rounding and some strange results with converting to integers in MDX.

     

    Check your spelling in SSAS 2008 MDX queries

    I just found out that because of wrong/different spelling of a dimension name/attribute in SSAS 2008 the server may return the following message:
    The set must have a single hierarchy to be used with the complement operator.

    To fix the issue, we must ensure that all the dimensions in our queries are spelled exactly the way they are named in the cube (surprise?)…

    I encountered this message by first building a Reporting Services report with data coming from SSAS. Then, a colleague misspelt a dimension name and instead of Natural Code he typed Natrual Code. Next time I needed to run the query in Query Designer I got the message. The query indeed used a complement operator on that dimension, exculding a particular account by its Natural Code. The query was something like:

    SELECT {
    [Measures].[Amount]
    } ON COLULMNS,
    {
    ([Account].[Major Account].ALLMEMBERS *
    [Employee].[Employee Name].ALLMEMBERS)
    } ON ROWS
    FROM (
    SELECT (-{[Account].[Natural Code].&[990]}) ON COLUMNS
    FROM [Finance]
    )

    Filtering Unneeded Dimension Members in PerformancePoint Filters

    Published on SQLServerCentral on the 2008-11-14:
    http://www.sqlservercentral.com/articles/PerformancePoint/64565/
     

    Sometimes we need to utilise dynamic dimension security in an Analysis Services solution and we also need to display dynamically the allowed members in a PerformancePoint filter. 

    In case our hierarchy is multi-level and in case we can expect to have security on multiple levels, PerformancePoint will display the full path upwards to the root member of the dimension. So, in the case where in a Business Unit hierarchy we have access to some third level members, in PerformancePoint we will see all their parents. In example if our hierarchy looks like this: 

    All Business Units
    -Europe
    –UK
    —-France
    —-Bulgaria
    -North America
    —-USA
    —-Canada 

    and we give someone access to France, they will in fact see: 

    All Business Units
    -Europe
    —-France 

    Indeed, when they select All Business Units or Europe they will still see only France data but this may be confusing. To eliminate the top levels we need to change the way we create our PerformancePoint filter. 

    To achieve this, first we need to create a MDX Query filter. For more information about PerformancePoint filters and dynamic dimension security you can read the following brilliant post on Nick Barclay’s blog: PPS Data Connection Security with CustomData. Nick explains how to set up PerformancePoint to work with Analysis Services dynamic dimension security and related topics. I will now concentrate on actually filtering the members of the already set-up MDX Query filter. 

    Instead of requesting all dimension members with a simple statement like: 

    DESCENDANTS([Business].[Business Hierarchy].Members,, SELF_AND_AFTER) 

    we can write some fairly simple MDX which means: 

    Get me the descendants of all dimension members whose ascendants (excluding themselves) have no more than one child. 

    And the MDX code is: 

     DESCENDANTS( FILTER([Business].[Business Hierarchy].Members AS a,   ((FILTER(ASCENDANTS(a.CurrentMember) AS a_asc,     a_asc.CurrentMember.CHILDREN.Count > 1).Count = 1)     And     a.CurrentMember.Children.Count > 1)   Or   ((FILTER(ASCENDANTS(a.CurrentMember) AS a_asc,     a_asc.CurrentMember.CHILDREN.Count > 1).Count = 0)     And     a.CurrentMember.Children.Count = 0)),,SELF_AND_AFTER) 

    The result will be France being displayed as the only available member in the PerformancePoint drop-down. Also, if we had two different allowed members, the code above would show us the top common parent of the two. Therefore, if we had France and Canada as the allowed set of dimension members, the drop-down would consist of the following hierarchy: 

    All Business Units
    -Europe
    —-France
    -North America
    —-Canada 

    thus satisfying our requirements.