Analysis Services – Boyan Penev on Microsoft BI

How to use ITEM and when ITEM(0).ITEM(0) is redundant

In MDX we have the Item function which can be used in a number of ways. It is important to understand how it works and how it can be used to our advantage.

As a start, we can call Item over a set or over a tuple:

{set}.Item(0) or (tuple).Item(0)

It may be important to note that when we call Item over a set, we get a tuple out of it (sets are collections of tuples), while if we call it over a tuple we get a member.

If we use Item with a tuple, we must specify as an argument the integer position of the member within the tuple which we want. However, when we works with sets, we can either do the same, or specify a number of strings, which identify specific tuples. Some examples:

Item with a tuple:

(a,b).Item(0) = a

SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 ([Product].[Category].&[4], [Date].[Calendar].[Calendar Year].&[2008]).Item(0)
} ON 1
FROM [Adventure Works]

(a,b).Item(1) = b

SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 ([Product].[Category].&[4], [Date].[Calendar].[Calendar Year].&[2008]).Item(1)
} ON 1
FROM [Adventure Works]

Item with a set:

{a,b,c}.Item(0) = a

WITH
SET SET1 AS
 { ([Product].[Category].&[4],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[1],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[3],[Date].[Calendar].[Calendar Year].&[2008])
 }
SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 SET1.Item(0)
} ON 1
FROM [Adventure Works]

{a,b,c}.Item(“a”) = a

WITH
SET SET1 AS
 { ([Product].[Category].&[4],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[1],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[3],[Date].[Calendar].[Calendar Year].&[2008])
 }
SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 SET1.Item("([Product].[Category].&[4],
             [Date].[Calendar].[Calendar Year].&[2008])")
} ON 1
FROM [Adventure Works]

{(a1,b1),(a2,b2),(a3,b3)}.Item(“a1″,”b1”) = (a1,b1)

WITH
SET SET1 AS
 { ([Product].[Category].&[4],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[1],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[3],[Date].[Calendar].[Calendar Year].&[2008])
 }
SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 SET1.Item("[Product].[Category].&[4]",
           "[Date].[Calendar].[Calendar Year].&[2008]")
} ON 1
FROM [Adventure Works]

When we specify a number of strings as arguments, we get the tuple which is defined by these strings/coordinates.

Now, let’s see what happens when we have a set of tuples and we use Item on it with a single argument:

{(a1,b1),(a2,b2),(a3,b3)}.Item(0) = (a1,b1)

WITH
SET SET1 AS
 { ([Product].[Category].&[4],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[1],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[3],[Date].[Calendar].[Calendar Year].&[2008])
 }
SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 SET1.Item(0)
} ON 1
FROM [Adventure Works]

We get a tuple back. Therefore, if we use a second Item function over the first one, we will get the member on that position from the tuple:

{(a1,b1),(a2,b2),(a3,b3)}.Item(0).Item(0) = (a1,b1).Item(0) = a1

To illustrate the concept:

WITH
SET SET1 AS
 { ([Product].[Category].&[4],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[1],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[3],[Date].[Calendar].[Calendar Year].&[2008])
 }
SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 SET1.Item(0).Item(0)
} ON 1
FROM [Adventure Works]

This gives us the whole amount for Accessories, while:

WITH
SET SET1 AS
 { ([Product].[Category].&[4],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[1],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[3],[Date].[Calendar].[Calendar Year].&[2008])
 }
SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 SET1.Item(0).Item(1)
} ON 1
FROM [Adventure Works]

gives us the total amount for 2008.

Even if we do:

WITH
SET SET1 AS
 { ([Product].[Category].&[4],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[1],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[3],[Date].[Calendar].[Calendar Year].&[2008])
 }
SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 SET1.Item(0).Item(0).Item(0).Item(0).Item(0).Item(0).Item(0).Item(0)
} ON 1
FROM [Adventure Works]

we still get the amount for accessories.

What happens here is:

With the first call of Item(0) over SET1 we get the first tuple from the set (in our case it is ([Product].[Category].&[4],[Date].[Calendar].[Calendar Year].&[2008])).
Then with the second call, we get the first member of this tuple – [Product].[Category].&[4].
Now, with the third call of Item(0) over this member, we get the first member from the implicitly converted to tuple member from the previous step. Therefore, we pull out the first member from it which is ([Product].[Category].&[4]).
From here onwards we flip between a tuple and a member as a result every time we call Item(0).

But if we do:

WITH
SET SET1 AS
 { ([Product].[Category].&[4],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[1],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[3],[Date].[Calendar].[Calendar Year].&[2008])
 }
SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 SET1.Item(0).Item(1).Item(1).Item(0).Item(0).Item(0).Item(0).Item(0)
} ON 1
FROM [Adventure Works]

we get nothing back. This is because there is no element on the second position/coordinate of the tuple ([Date].[Calendar].[Calendar Year].&[2008]).

Therefore calling Item(0) after another Item(0) is rarely necessary and should be done only if we need it, because we could either get wrong results or possibly hurt our query performance.

Note: Please read the comments below, where you can find an in-depth discussion about the concepts in this article.

Set Operations in MDX – UNION and EXCEPT

I just read an article MDX: Except written by Vincent Rainardi.

It shows set subtraction by usng the EXCEPT function (as you could derive from the title, no doubt). I have always been a fan of using the “-” and “+” operators instead of EXCEPT and UNION where possible because in my opinion they give us better visiblity of the intention we are putting behind our MDX expressions. However, EXCEPT and UNION have an advantage over “-” and “+” – the third parameter ALL.

In BOL we can see that both of these functions can be used like this: UNION/EXCEPT(set1, set2, ALL). If we skip the ALL keyword, we would get exactly what we would get with +/-. Some examples:

SET1: {a, b, c}
SET2: {c}

SET3 = UNION(SET1, SET2) = SET1+SET2 = {a, b, c}+{c} = {a, b, c}

But if we use ALL, we would get duplicates in our result set:

SET4 = UNION(SET1, SET2, ALL) = UNION({a, b, c}, {c}, ALL} = {a, b, c, c}

The difference here is the duplicates, which get preserved in SET4 because of ALL. And this is also where EXCEPT is different to “-“:

SET5 = EXCEPT(SET4, SET2) = SET4-SET2 = {a, b, c, c} – {a} = {b, c}

while

SET6 = EXCEPT(SET4, SET2, ALL) = {a, b, c, c} – {a} = {b, c, c}

As you can see, unlike in set math where a set cannot have dupicates, in MDX we can. Therefore, if we are in a situation where we need to preserve these, we have the option of using the UNION and EXCEPT functions with the ALL parameter.

I am using these concepts on every-day basis and I have found that mastering them gives me a very powerful way of solving many MDX problems. I hope that the examples are suitable and easy to understand – when I read the article after I have just written it I sound a bit like my math lecturer from uni (I wish he knew and taught MDX), who was a good guy, so I guess there is nothing wrong with that 🙂

Using the DSV to its Full Potential

The Data Source View in Analysis Services is a very powerful abstraction of the data source and it can help us overcome some scenarios in an easy and clean way. Many times we look for MDX or programmatic solutions to problems, which can be tackled best in our data. While for complex tasks we would be better off extending the ETL process, some simple ones can and should be implemented in the DSV.

As an introduction to the topic I would like to explain briefly what the DSV actually is. It can be conceptualised as a database view on top of the data source. By default all tables which we need for building the Analysis Services database (typically dimensions and facts) are appearing in the DSV as table bindings (exactly as if we do a SELECT * FROM Table). If we have no foreign keys defined in our database, SSAS will not show us the relationships in the DSV. However, we can define logical relationships in the DSV, thus connecting the tables on related columns, which are then used for automatically determining dimension relationships to the measure groups.

There are two important ways to modify the DSV, which allow us to add more columns to the existing tables and to modify the way the existing columns are shown:

Named Queries

If we right-click on a table in the DSV, we can select to replace the table with a Named Query. A Named Query is essentially a T-SQL statement, which is equivalent to a database view definition. By utilising Named Queries we can alter the way we see the tables and their column in SSAS. In example, we could concatenate columns, implement CASE logic, etc. Named Queries can be thought of as equivalent to database views.

Named Calculations

A named calculation is a SQL statement which adds a column to a table without modifying the table binding. It gives us an easy way to define a new column without changing the whole query. The statement defining the column is in T-SQL and it behaves the same way as a new column in a Named Query (or a SELECT statement). If we just want to add one more column (e.g. Display Order, Code+Description concatenation, etc.), we can simply define a Named Calculation. Also, as the name suggests, Named Calculations can be commonly used for defining a leaf-level calculation without modifying a large fact table’s SELECT statement in a Named Query.

The column we define here appears in both the DSV table and in the Dimension Designer window:

These two DSV functions can be used in many scenarios. Most importantly, there are a few when they yield better performance, faster development and easier maintenance:

Leaf-level calculations

If we have the common requirement to perform leaf-level calculations and then aggregate this up the hierarchy, as opposed to aggregating and then calculating, the best way to do this is in a SQL statement on the fact table. Alternatively, we can do this in and MDX statement:

SUM(DESCENDANTS(Dim.CurrentMember,,LEAVES), MeasureCalc)

However, it comes at a price. Since SSAS would have to do the calculation for each leaf and then sum this up the hierarchy, this could take a long time to perform. Also, SSAS would not be able to use pre-processed aggregations and the calculations will be done at execution time. To avoid this we could add a new column to the fact table and do the calculation there (in SQL), using the column as a new measure in the cube, which can then be aggregated by SSAS as any other measure. The performance gain is usually substantial and using a Named Query or a Named Calculation should always be the preferred option.

Description Attributes

Often we need to perform a concatenation between different dimension attributes, which we can use as a Description attribute while slicing the cube, or when providing reports from the SSAS database. A very easy way to achieve such a requirement is to use our DSV and concatenate the column we need in a new column in the dimension table, which we can expose as a new attribute in the dimension. A task such as concatenating an Account Code and Account Description into an Account Long Description (i.e. [Account Code] + ‘-‘ + [Account Description]) becomes very easy to implement within the DSV without modifying the ETL or any tables.

Composite Keys

Sometimes we need to build unique keys for attribute column in a dimension. A good example is a Date dimension, which does not have unique keys for non-leaf levels such as Month. Often developers have Month Key of 1,2,3-12. This does not make a good Month key in SSAS as it is not unique for higher levels such as Year, Quarter, etc. There are a number of ways to tackle this common scenario. While the recommended approach would be to build a concatenation between Year-Quarter-Month as a Month Key in the dimension table, we can also achieve this by either selecting all of the columns as key columns for the attribute in the dimension attribute properties. However, this would give us a concatenated key in MDX and this could sometimes be undesirable. A yet simpler and cleaner solution is to concatenate the relevant columns in the DSV by using a Named Query. Instead of the typical

SELECT col1, col2,.., MonthKey, colx, coly, coly FROM DimDate

we can write

SELECT col1, col2,…,YearKey+QuarterKey+MonthKey AS MonthKey, colx, coly, coz FROM DimDate

This way we can use the MonthKey column directly as a key for our Month attribute.

While this is useful for a Date dimension, it can also be useful for any other composite key definition in our dimensions.

Other possible applications of DSV Named Queries and Named Calculations are the implementation of

Sort Order attribute, in cases when we need custom sort of the dimension attributes
Restricting the data which comes into the cube dynamically based on a certain condition (think of a Date dimension, which includes only relevant periods)
Combining tables – by a SQL join
Replacing 0s with NULLs (the opposite can be done automatically in SSAS) for our measures

Basically, in a DSV we can “correct” our data to make it suitable for our cube without changing the ETL.

Last but not least, we can also transform tables to conform to a star-schema-like design. If we want to show a proof of concept on top of a normalized OLTP database, we could avoid the ETL complexities, as well as building a datamart, and use SQL to join/split tables in dimension and fact tables, which are suitable for cube development. While this could work in post-POC scenarios, it would be better to take a cautious approach to it as there are many scenarios when it would either not work, or will be too slow.

And a word of warning – your DSV could become slow because of over-use of complex Named Queries. This could be painful when minimising cube processing time is crucial, or when the DSV starts timing out and queries take hours to execute. Luckily, in most cases we can simply move these large queries forward – to the ETL where we have more time and better tools (e.g. SSIS).

Passing unCONSTRAINED Set and Member parameters between reports in Reporting Services

By default SSRS MDX queries get a StrToMember or StrToSet functions with a CONSTRAINED flag. However, many developers do not quite know why it is there or what it actually does. Books-On-Line contains this statements:

StrToMember

When the CONSTRAINED flag is used, the member name must be directly resolvable to a qualified or unqualified member name. This flag is used to reduce the risk of injection attacks via the specified string. If a string is provided that is not directly resolvable to a qualified or unqualified member name, the following error appears: “The restrictions imposed by the CONSTRAINED flag in the STRTOMEMBER function were violated.”

When the CONSTRAINED flag is not used, the specified member can resolve either directly to a member name or can resolve to an MDX expression that resolves to a name.

StrToSet

When the CONSTRAINED flag is used, the set specification must contain qualified or unqualified member names or a set of tuples containing qualified or unqualified member names enclosed by braces {}. This flag is used to reduce the risk of injection attacks via the specified string. If a string is provided that is not directly resolvable to qualified or unqualified member names, the following error appears: “The restrictions imposed by the CONSTRAINED flag in the STRTOSET function were violated.”

When the CONSTRAINED flag is not used, the specified set specification can resolve to a valid Multidimensional Expressions (MDX) expression that returns a set.

Therefore, if you have a CONSTRAINED flag you have to specify exact members or sets (e.g. [Date].[Year].[2009], or {[Date].[Year].[2009],[Date].[Year].[2010]}). If you omit the flag, you can pass to the StrToMember an expression, which evaluates to a member (e.g. [Date].[Year].[Year].Members.Item(0)), and to StrToSet an expression, which evaluates to a set (e.g. NONEMPTY([Date].[Year].[Year].Members, [Measures].[Amount]).

The flexibility which removing CONSTRAINED offers can be quite powerful when passing parameters between reports. In example, we may want to pass a parameter to a drill-through report from two different summary reports, where each of those uses a different subset of dimension members, which in turn can be derived by different set expressions.

Major drawbacks of using this approach is the severe performance hit it leads to, as well as a possible “MDX injection” vulnerability. Since in most cases we would be using the passed parameters in a subcube expression or on the slicer axis (WHERE clause), this should not lead to as bad performance as we would get if we would use it inside a calculation. However, when we need to use a parameter directly in a calculated measure, we would be better off avoiding an unCONSTRAINED function.

Therefore, we may instead use SetToStr on the summary reports and pass a String parameter to a CONSTRAINED StrToSet function in the drill-through report. This way we are first resolving the set expression once and passing it on to the underlying report as a string. We could do that in a calculated measure returning a String, then passed on as a Field to the drill-through parameter. However, in the rare case where we have many rows travelling from the SSAS server to the SSRS server, this could be slow.

So, whether we use a StrToSet without CONSTRAINED flag, or a String parameter constructed by a SetToStr function, is dependent on the actual scenario, but it is good to have both options in our arsenal of tools and techniques when we need to implement some non-quite-standard piece of functionality.

When Not To Write MDX and When Not To Use Analysis Services

MDX is a great way to achieve some objectives easily and efficiently. However, there are some things better done in other ways. I will first discuss three distinct mistakes, which designers and developers tend to make when working on a typical BI solution.

1. Leaf-Level Calculations

This is by far the most common one. Both on the MSDN SQL Server forums, and in practice – developers try building calculations in MDX on the leaf level of some dimensions and usually hit severe performance problems. While it is usually possible to build an MDX expression to achieve the goal, it is usually much simpler and way better for performance to just do the calculation either in the ETL, or in the DSV (either as a Named Calculation, or as a part of a Named Query). This avoids the need for the query engine to perform a large number of calculations every time we request an aggregate.

2. Mocking Joins

I have discussed this in a previous post, where I am explaining how we can access a measure group through a dimension, which is not directly related to it (but rather related to it through another measure group and another dimension). Well, instead of doing this, we can simply build a many-to-many relationship between the two and avoid the whole MDX bit.

3. Business Logic over a large dimension

MDX is brilliant for implementing business logic. Unless it needs to operate over millions of dimension members every time a calculation is being requested. In example, recently I tried building a bit of business logic, which needed to order a dimension over a measure, and get the member with a largest value for each member of another dimension with hundreds of thousands of members. On top of it there were other calculations doing similar logic and the end result was not quite what was expected. Even though the MDX was really neat and achieved the purpose in 3-4 lines, I moved the logic back to the ETL (which was quite a bit more complex) because of performance. So, in general, I would not advise in favour of using MDX when to retrieve the result, the query engine needs to go through a lot of cells (in my case quite a few million), especially when ordering is required.

A more fundamental mistake is using Analysis Services in a solution that does not really need it. Two severe and common, in my opinion mistakes are:

1. Data Dumps

Why build a cube when the sole purpose of the project is to allow users to access the underlying data? Unfortunately, sometimes Analysis Services is seen as a silver bullet for everything. If the end report contains a massive amount of data and a key requirement is for it to export nicely to CSV, why bother – just export the data to CSV, zip it up and let the users download it. As far as I know, this can be achieved very easily in a number of other ways. Especially considering the amount of effort and skills needed to build and maintain a SSAS solution.

2. No Aggregations

Another way SSAS gets misused is when a lot of textual data gets stored in a large number of big dimensions, and those get linked in a “fact table”. I have previously worked on a solution where there were no measure columns in the fact table at all and the cube was used to retrieve information about dimension members of the largest dimension called “Member”, containing 4-5 million customers. The rest were dimensions like “Sign Up Date”, “Date Suspended”, “Country of Birth”, “Age Band”, etc. In the end, the main report consisted of the information about the members. No data was aggregated apart from a simple count. The entire OLAP solution could have been replaced by a SQL query with a WHERE clause and an index.

I am sure that there are many other cases when SSAS gets misused. A solution utilising SSAS and MDX properly can be very powerful and elegant. However, sometimes because of poor design choices it gets discarded as inadequate. Don’t blame the tools and the technology if your cube is slow – it is most likely a problem with either your design or the way you have developed your solution.

All Member Properties – Name, Key and Level

I just tried to find some more information about the All Member in SSAS dimension hierarchies and since it was not readily available, I had to experiment a bit, so I thought I may as well share my findings. For some these may be obvious, but for some they could as well be interesting.

So in brief, I will explore the Name, Key and Level of an All member in a dimension hierarchy. The one of choice was the Customer dimension and Customer Geography hierarchy in Adventure Works. There is an All member, called All Customers. As expected, .PROPERTIES(“MEMBER_NAME”) gives us “All Customers”:

WITH
MEMBER [Measures].[test] AS
[Customer].[Customer Geography].CurrentMember.PROPERTIES(“MEMBER_NAME”)
SELECT
{
[Measures].[test]
} ON 0,
{
[Customer].[Customer Geography].Members
} ON 1
FROM [Adventure Works]

The first row shows us: All Customers.

Now, let’s see what its key is:

WITH
MEMBER [Measures].[test] AS
[Customer].[Customer Geography].CurrentMember.PROPERTIES(“KEY”)
SELECT
{
[Measures].[test]
} ON 0,
{
[Customer].[Customer Geography].Members
} ON 1
FROM [Adventure Works]

This gives us 0.

And its level:

WITH
MEMBER [Measures].[test] AS
[Customer].[Customer Geography].CurrentMember.Level.Name
SELECT
{
[Measures].[test]
} ON 0,
{
[Customer].[Customer Geography].Members
} ON 1
FROM [Adventure Works]

The result this time is: (All).

So far so good. Now let’s try using these to get only the All member:

SELECT
{
[Customer].[Customer Geography].[All Customers]
} ON 0
FROM [Adventure Works]

This works. Now if we try the Key:

SELECT
{
[Customer].[Customer Geography].&[0]
} ON 0
FROM [Adventure Works]

Interestingly, since the All member is a calculated member and has no physical key, if we try to use the one that SSAS gave us does not actually work – we get nothing on Axis 0.

Using the level works:

SELECT
{
[Customer].[Customer Geography].[(All)].Item(0)
} ON 0
FROM [Adventure Works]

Also, after experimenting a bit further:

SELECT
{
[Customer].[Customer Geography].[All]
} ON 0
FROM [Adventure Works]

This query also works even though the All member name is [All Customers], not just [All]. However, Analysis Services does recognise [All].

In summary, the most robust options for referencing the All member in a dimension hierarchy that I have found are:

1. [Dimension].[Hierarchy].[(All)].Item(0)
2. [Dimension].[Hierarchy].[All]

These will always work – regardless of the dimension and hierarchy names.

Another option is using [Dimension].[Hierarchy].[] – e.g. [Customer].[Customer Hierarchy].[All Customers]

And, one that does not work – referencing through its alleged key: [Customer].[Customer Hierarchy].&[0]

Please let me know if there are any better alternatives, or why it would give me a key of 0 for the All member and would not work when actually using this key.

Average Aggregation in Analysis Services

In SSAS we do not have a measure Average aggregation type. We do have AverageOfChildren (or Average over time), however it is semi-additive and works only along a Time dimension. Fortunately, we have Sum and Count, and since Average = Sum / Count, we can build our own Average aggregation when we need one.

To do that:

Create a measure using the Sum aggregation type (which is also the default). In our example, let’s call it Amount.
Create a Count of Non-Empty Values (or Count of Rows) measure. In example – [Measure Count].
Create the actual calculation – [Measures].[Amount]/[Measures].[Measure Count]

We can either create a calculated measure, which performs the MDX calculation above:

CREATE MEMBER CURRENTCUBE.[Measures].[Average Amount]
AS
[Measures].[Amount]/[Measures].[Measure Count]
,NON_EMPTY_BEHAVIOR = {[Measures].[Measure Count]}
,VISIBLE=1;

, or if we really do not need the Sum base measure, we can set it to be replaced by the calculation with a SCOPE statement:

SCOPE([Measures].[Amount]);
This = [Measures].[Amount]/[Measures].[Measure Count];
NON_EMPTY_BEHAVIOR(This) = [Measures].[Measure Count];
END SCOPE;

Voila! We have created a measure simulating an Average aggregation type. Then, we can hide the Count helper measure and from user point of view there is no evidence of our effort.

Since the count will never be 0, we do not have to say “If not 0, divide, else – do not” and the NON_EMPTY_BEHAVIOR query hint may in fact improve performance, since the calculation will not be performed when the Count measure is NULL (instead of resulting in NULL/NULL=NULL).

Mosha has previously blogged about NON_EMPTY_BEHAVIOR and division by zero and I strongly recommend reading his post.

Another important consideration, which depends on the business scenario is the type of the Count aggregate. It could be Count of rows (Row Bound) or Count of non-empty values (Column Bound). The difference is whether we want to include or exclude the empty values from our aggregate. Either way, the described technique will work equally well.

I realise that this is a well-known approach, but since it is hard to find the solution online I thought it may be interesting for some less-experienced developers.

7 Ways to Process Analysis Services Objects

Being asked a bit too often how we can process Analysis Services databases (or cubes and dimensions) here is a list of 7 different methods:

1. Through the GUI

This one is obvious. We can do it through both SSMS and BIDS.

2. XMLA Script

To generate the script we can use the hefty Script button in SSMS. Simply configuring the processing settings and then instead of clicking the all too usual OK, we can as well click on the little button in the top left corner of the Process window:

Then, we can just execute the generated query.

3. SSIS Analysis Services Processing Task

This Control Flow task allows us to configure any settings and then add it to our ETL process. Quite handy.

4. SQL Server Agent Job

This one is really an automation of Method #2 – XMLA Script. We can encapsulate it into a job of SQL Server Analysis Services Command type:

5. .NET Code

This allows us to process cubes as a part of an application. Nice if we want to let our users process our cubes on-demand. Of course, better left to application developers, but still a good trick to know. Especially if we want to seem all-knowing when it comes to databases of any type. To achieve this objective, we use AMO (Analysis Management Objects). An API can be found here:

http://technet.microsoft.com/en-us/library/microsoft.analysisservices(SQL.90).aspx

6. Command Line – ascmd

The command line utility can do a lot – including processing SSAS objects. For a full readme you can go here:

http://msdn.microsoft.com/en-us/library/ms365187.aspx

7. Command Line – PowerShell

This PowerShell script will perform a Full Process of Adventure Works DW 2008 on localhost:

[Reflection.Assembly]::LoadWithPartialName(“Microsoft.AnalysisServices”)
$servername=New-Object Microsoft.AnalysisServices.Server
$servername.connect(“localhost”)
$databasename=New-Object Microsoft.AnalysisServices.Database
$databasename=$servername.Databases.GetByName(“Adventure Works DW 2008”)
$databasename.Process(“ProcessFull”)

Using AMO we can do any maintenance tasks through PowerShell, including an object process.

Probably not a fully exhaustive list, but I hope it helps with giving developers some options when it comes to this trivial and crucial part of the development and deployment process.

Filtering measures by indirectly related dimensions in MDX

I have lately started visiting the SQL Server MSDN Forums and trying to answer some questions about Analysis Services and Reporting Services. One of the questions about MDX queries seems to get repeated quite often and I will try to address it in this post, so hopefully more people will get to read this rather than ask about it on MSDN.

The actual question takes the form of:
“I have Dimension A and Dimension B, related to Measure 1. Dimension B is also related to Measure 2. How can I (is it possible to) get the values for Measure 1 filtered/sliced by Dimension A. I know it is easy to achieve with a join in SQL, but I do not know how to do it with MDX.

This suggest the following dimension model:

One solution would be creating a many-to-many relationship between Dimension A and Measure Group 2. However, we may want to avoid that for some reason and answer the problem with a query.

We can achieve the desired result in a number of ways but I will discuss the one using NONEMPTY. Others would be using FILTER and EXISTS.

A sample script is:

SELECT
{
[Measures].[Measure 2]
} ON 0,
NON EMPTY
{
NONEMPTY( [Dimension B].[Dimension B Hierarchy].Members,
([Measures].[Measure 1], [Dimension A].[Dimension A Hierarchy].&[Member_Key]))
} ON 1
FROM [Cube]

What this script does:

1. Gets all Dimension B members, which have associated cells for Measure 1 and the specific Dimension A member (which we are filtering/slicing by)
2. Gets the Measure 2 cells for the set of members retrieved in Step 1
3. Removes members from Step 1, for which cells from Step 2 are empty

An AdventureWorks example is:

SELECT
{
[Measures].[Internet Order Count]
} ON 0,
NON EMPTY
{
NONEMPTY( [Product].[Product].Members,
([Measures].[Reseller Order Count], [Reseller].[Reseller].&[238]))
} ON 1
FROM [Adventure Works]

Combining Slowly Changing Dimensions and Current Dimension Versions

When we need to see historical changes of a dimension in our OLAP cube the common practice is to implement it as a SCD – or a Slowly Changing Dimension. There are a few ways to do this and a really good definition of the different types of SCDs can be found in Wikipedia: Slowly Changing Dimension. Also, there are quite a few articles on Implementing SCD ETLs in SSIS, two of which are:

SCD Wizard Demo – SSIS Junkie blog example of a package using the Slowly Changing Dimension transformation in SSIS
MSDN Article on the Slowly Changing Dimension transformation in SSIS

Since SQL Server Integration Services 2005 and 2008 include a SCD transformation it is not too hard to implement such dimensions.

Here I am discussing a typical requirement – to be able to have a SCD and a Current version of the dimension.

First, it is important to notice that a SCD should have two dimension keys: a unique surrogate key identifying every version of the dimension members and a non-unique code, which is common for all versions for a dimension member. This is also very important if we want to be able to determine the current version of a dimension member. An example of a very simple dimension table utilising this design is:

Here we have two distinct dimension members with Code of 1 and 2. Member1 has two versions and Member2 has three. The SKeys (surrogate keys) for these versions are unique but the codes stay the same for each member. Also, notice the From and To dates which allow us to distinguish the periods for the member versions. We can have an IsActive or IsCurrent bit column, which shows us the latest version of a node, but we can also just filter on dates which are 9999-12-31, which will give us the same result.

Assuming the described design I will move on to discuss the ways to build a dimension in SSAS.

First, the standard way to link the dimension table to our fact table is through the surrogate key. We can have a regular relationship between the two tables. As the fact data is usually also linked to a Time dimension, fact records linked against the periods between the From and To dates of our SCD will be linked to that versions SKey. An example of a fact table with a few rows, which can be linked to the dimension table above is:

The row with a FactKey of 1 will be linked against Member1Ver1, while FactKey 2 will go against Member1Ver2. Therefore, when we slice our cube by Time and our dimension we will see:

This is the standard way to implement our SCD and these are the results we would expect. Now, we get a new requirement. We want to be able to see both this and an aggregation against the current version of our dimension. We have a few ways to implement it. One obvious way is to create another dimension containing only the current dimension members. This can be easily achieved if we add a Named Query in our DSV, which shows only the current dimension members:

SELECT SKey
, Code
, Description
FROM DimTable
WHERE ToDate = ‘9999-12-31’

The result will be:

Then we need to replace our fact table with a Named Query, which shows the DimSKeys for current version dimension members:

SELECT ft.FactSkey
, dt_current.DimSKey
, ft.TimeKey
, ft.Amount
FROM FactTable ft
INNER JOIN DimTable dt
ON ft.DimSKey = dt.SKey
INNER JOIN DimTable dt_current
ON dt.Code = dt_current.Code
WHERE dt_current.ToDate = ‘9999-12-31’

This will give us the following result:

When we slice our cube, all records for Member1 will be against the latest version:

Implementing this, we can have two dimensions in our cube, so our users can use the one that makes more sense for their needs:

Dimension and
Dimension (Historical), and the Historical designation stands for, in technical terms, a SCD

However, we can also implement this in a different way, which allows us to avoid building such logic in a view or our DSV. The trade-off is some space on our disks and one more column in our fact table. Instead of adding a new column through writing SQL, we can simply add the dimension Code in the fact table. Then, we can build our dimension again by getting the latest versions, but instead of having the SKey as a dimension key, we can use the Code. It is of course unique across all dimension members, as long as we filter our the non-current versions. The query for doing this is exactly the same as the one we used before. However, we need to change our fact table design and add a DimCode column:

Then, we create two dimensions again, but we link the Historical dimension with the DimSKey column and the Current one with the DimCode column. The result of slicing the cube by the current version is exactly the same as before. The trade-off is space vs. processing time and CPU usage. It is up to the developer to choose the more appropriate way to build the solution.

So far I discussed two ways of having our SCD and Current Version dimension in different dimensions in our cubes. There is, however a way to combine both in the same dimension. To do this, we need to have two levels in the dimension: a parent level, which contains the current version of the dimension members, and a child level, which contains the historical versions. In example:

Member1Ver2
Member1Ver1
Member1Ver2
Member2Ver3
Member2Ver1
Member2Ver2
Member2Ver3

This way the historical versions aggregate up to the current version and we can use either level, depending on what we want to achieve. To build this, we can use our current dimension table and add a parent level through SQL. This way, we do not need to update all records when a new version comes:

SELECT dt.SKey
, dt.Code
, dt.Description
, dt_p.SKey AS ParentSKey
FROM DimTable dt
INNER JOIN DimTable dt_p
ON dt.Code = dt_p.Code
WHERE dt_p.ToDate = ‘9999-12-31’

The result is:

Then, we can build our Parent-Child dimension and we can use the Parent level is we want to have current versions and the Child level for the historical ones.

This approach allows us to combine the two dimensions into one. It is also possible to implement it in a non-parent child fashion because the hierarchy is not ragged.

It is always advisable to make sure we actually need a SCD and avoid it whenever possible because it is not always intuitive for users to use one. Splitting our fact data on multiple rows can be surprising for users and understanding how the historical dimension works and the multiple nodes it consists of can be a problem. However, it lets us satisfy a common requirement and therefore it is quite important to know how to build.