7 Ways to Process Analysis Services Objects

Being asked a bit too often how we can process Analysis Services databases (or cubes and dimensions) here is a list of 7 different methods:

1. Through the GUI

This one is obvious. We can do it through both SSMS and BIDS.

2. XMLA Script

To generate the script we can use the hefty Script button in SSMS. Simply configuring the processing settings and then instead of clicking the all too usual OK, we can as well click on the little button in the top left corner of the Process window:

xmla_script

Then, we can just execute the generated query.

3. SSIS Analysis Services Processing Task

This Control Flow task allows us to configure any settings and then add it to our ETL process. Quite handy.

image

4. SQL Server Agent Job

This one is really an automation of Method #2 – XMLA Script. We can encapsulate it into a job of SQL Server Analysis Services Command type:

image

5. .NET Code

This allows us to process cubes as a part of an application. Nice if we want to let our users process our cubes on-demand. Of course, better left to application developers, but still a good trick to know. Especially if we want to seem all-knowing when it comes to databases of any type. To achieve this objective, we use AMO (Analysis Management Objects). An API can be found here:

http://technet.microsoft.com/en-us/library/microsoft.analysisservices(SQL.90).aspx

6. Command Line – ascmd

The command line utility can do a lot – including processing SSAS objects. For a full readme you can go here:

http://msdn.microsoft.com/en-us/library/ms365187.aspx

7. Command Line – PowerShell

This PowerShell script will perform a Full Process of Adventure Works DW 2008 on localhost:

[Reflection.Assembly]::LoadWithPartialName(“Microsoft.AnalysisServices”)
$servername=New-Object Microsoft.AnalysisServices.Server
$servername.connect(“localhost”)
$databasename=New-Object Microsoft.AnalysisServices.Database
$databasename=$servername.Databases.GetByName(“Adventure Works DW 2008”)
$databasename.Process(“ProcessFull”)

Using AMO we can do any maintenance tasks through PowerShell, including an object process.

Probably not a fully exhaustive list, but I hope it helps with giving developers some options when it comes to this trivial and crucial part of the development and deployment process.

Advertisements

Filtering measures by indirectly related dimensions in MDX

I have lately started visiting the SQL Server MSDN Forums and trying to answer some questions about Analysis Services and Reporting Services. One of the questions about MDX queries seems to get repeated quite often and I will try to address it in this post, so hopefully more people will get to read this rather than ask about it on MSDN.

The actual question takes the form of:
“I have Dimension A and Dimension B, related to Measure 1. Dimension B is also related to Measure 2. How can I (is it possible to) get the values for Measure 1 filtered/sliced by Dimension A. I know it is easy to achieve with a join in SQL, but I do not know how to do it with MDX.

This suggest the following dimension model:

One solution would be creating a many-to-many relationship between Dimension A and Measure Group 2. However, we may want to avoid that for some reason and answer the problem with a query.

We can achieve the desired result in a number of ways but I will discuss the one using NONEMPTY. Others would be using FILTER and EXISTS.

A sample script is:

SELECT
{
[Measures].[Measure 2]
} ON 0,
NON EMPTY
{
NONEMPTY( [Dimension B].[Dimension B Hierarchy].Members,
([Measures].[Measure 1], [Dimension A].[Dimension A Hierarchy].&[Member_Key]))
} ON 1
FROM [Cube]

What this script does:

1. Gets all Dimension B members, which have associated cells for Measure 1 and the specific Dimension A member (which we are filtering/slicing by)
2. Gets the Measure 2 cells for the set of members retrieved in Step 1
3. Removes members from Step 1, for which cells from Step 2 are empty

An AdventureWorks example is:

SELECT
{
[Measures].[Internet Order Count]
} ON 0,
NON EMPTY
{
NONEMPTY( [Product].[Product].Members,
([Measures].[Reseller Order Count], [Reseller].[Reseller].&[238]))
} ON 1
FROM [Adventure Works]

Combining Slowly Changing Dimensions and Current Dimension Versions

When we need to see historical changes of a dimension in our OLAP cube the common practice is to implement it as a SCD – or a Slowly Changing Dimension. There are a few ways to do this and a really good definition of the different types of SCDs can be found in Wikipedia: Slowly Changing Dimension. Also, there are quite a few articles on Implementing SCD ETLs in SSIS, two of which are:
  • SCD Wizard Demo – SSIS Junkie blog example of a package using the Slowly Changing Dimension transformation in SSIS
  • MSDN Article on the Slowly Changing Dimension transformation in SSIS

Since SQL Server Integration Services 2005 and 2008 include a SCD transformation it is not too hard to implement such dimensions.

Here I am discussing a typical requirement – to be able to have a SCD and a Current version of the dimension.

First, it is important to notice that a SCD should have two dimension keys: a unique surrogate key identifying every version of the dimension members and a non-unique code, which is common for all versions for a dimension member. This is also very important if we want to be able to determine the current version of a dimension member. An example of a very simple dimension table utilising this design is:

Here we have two distinct dimension members with Code of 1 and 2. Member1 has two versions and Member2 has three. The SKeys (surrogate keys) for these versions are unique but the codes stay the same for each member. Also, notice the From and To dates which allow us to distinguish the periods for the member versions. We can have an IsActive or IsCurrent bit column, which shows us the latest version of a node, but we can also just filter on dates which are 9999-12-31, which will give us the same result.

Assuming the described design I will move on to discuss the ways to build a dimension in SSAS.

First, the standard way to link the dimension table to our fact table is through the surrogate key. We can have a regular relationship between the two tables. As the fact data is usually also linked to a Time dimension, fact records linked against the periods between the From and To dates of our SCD will be linked to that versions SKey. An example of a fact table with a few rows, which can be linked to the dimension table above is:

The row with a FactKey of 1 will be linked against Member1Ver1, while FactKey 2 will go against Member1Ver2. Therefore, when we slice our cube by Time and our dimension we will see:

This is the standard way to implement our SCD and these are the results we would expect. Now, we get a new requirement. We want to be able to see both this and an aggregation against the current version of our dimension. We have a few ways to implement it. One obvious way is to create another dimension containing only the current dimension members. This can be easily achieved if we add a Named Query in our DSV, which shows only the current dimension members:

SELECT SKey
, Code
, Description
FROM DimTable
WHERE ToDate = ‘9999-12-31’

The result will be:

Then we need to replace our fact table with a Named Query, which shows the DimSKeys for current version dimension members:

SELECT ft.FactSkey
, dt_current.DimSKey
, ft.TimeKey
, ft.Amount
FROM FactTable ft
INNER JOIN DimTable dt
ON ft.DimSKey = dt.SKey
INNER JOIN DimTable dt_current
ON dt.Code = dt_current.Code
WHERE dt_current.ToDate = ‘9999-12-31’

This will give us the following result:

When we slice our cube, all records for Member1 will be against the latest version:

Implementing this, we can have two dimensions in our cube, so our users can use the one that makes more sense for their needs:

  • Dimension and
  • Dimension (Historical), and the Historical designation stands for, in technical terms, a SCD

However, we can also implement this in a different way, which allows us to avoid building such logic in a view or our DSV. The trade-off is some space on our disks and one more column in our fact table. Instead of adding a new column through writing SQL, we can simply add the dimension Code in the fact table. Then, we can build our dimension again by getting the latest versions, but instead of having the SKey as a dimension key, we can use the Code. It is of course unique across all dimension members, as long as we filter our the non-current versions. The query for doing this is exactly the same as the one we used before. However, we need to change our fact table design and add a DimCode column:

Then, we create two dimensions again, but we link the Historical dimension with the DimSKey column and the Current one with the DimCode column. The result of slicing the cube by the current version is exactly the same as before. The trade-off is space vs. processing time and CPU usage. It is up to the developer to choose the more appropriate way to build the solution.

So far I discussed two ways of having our SCD and Current Version dimension in different dimensions in our cubes. There is, however a way to combine both in the same dimension. To do this, we need to have two levels in the dimension: a parent level, which contains the current version of the dimension members, and a child level, which contains the historical versions. In example:

Member1Ver2
Member1Ver1
Member1Ver2
Member2Ver3
Member2Ver1
Member2Ver2
Member2Ver3

This way the historical versions aggregate up to the current version and we can use either level, depending on what we want to achieve. To build this, we can use our current dimension table and add a parent level through SQL. This way, we do not need to update all records when a new version comes:

SELECT dt.SKey
, dt.Code
, dt.Description
, dt_p.SKey AS ParentSKey
FROM DimTable dt
INNER JOIN DimTable dt_p
ON dt.Code = dt_p.Code
WHERE dt_p.ToDate = ‘9999-12-31’

The result is:

Then, we can build our Parent-Child dimension and we can use the Parent level is we want to have current versions and the Child level for the historical ones.

This approach allows us to combine the two dimensions into one. It is also possible to implement it in a non-parent child fashion because the hierarchy is not ragged.

It is always advisable to make sure we actually need a SCD and avoid it whenever possible because it is not always intuitive for users to use one. Splitting our fact data on multiple rows can be surprising for users and understanding how the historical dimension works and the multiple nodes it consists of can be a problem. However, it lets us satisfy a common requirement and therefore it is quite important to know how to build.

Moving writeback data in the Fact tables and avoiding problems with changing column names

While writeback functionality in SQL Server Analysis Services 2008 has changed significantly and writeback values are stored in the OLAP cubes, in SSAS 2005 the writeback values are stored in a relational table on the same server with the fact tables. When the writeback functionality is enabled for a partition, a new table is automatically created which bears a prefix of WriteTable. Its structure is fairly simple: it contains a column for each dimension and two audit fields.

The ROLAP nature of the writeback table makes it inefficient for storage of a large number of writeback records, and it is sometimes required to consolidate the data it contains with the fact table.

Normally we can write a stored procedure, which can do this for us. Because the values in the WriteTable are deltas there is a new row for each user change. In example, if we change 0 to 5, there will be one row in the writeback table, which shows 5 as a measure value. If then we change the new value of 5 to 2, there will be a new row with a measure value of -3. Therefore, it could be more efficient to perform a quick aggregation of the values in the WriteTable while moving them in the fact table. This could also be contrary to our requirements if we want to be able to trace all data changes.

In either case, we end up with a number of new rows and we can insert these into our fact table, after which we can truncate our WriteTable and process our cube. There is a potential pitfall here. If we do not set up properly the processing settings, we could destroy our WriteTable and have it re-created, which in turn introduces another pitfall – SSAS may change our column suffixes. In example, if we have a fact table with the following definition:

CREATE TABLE [Fact_IndicatorAmount](
[Fact_IndicatorAmount_Id] [int],
[ETL_Date] [timestamp],
[Indicator_Id] [int],
[Region_Id] [int],
[Scenario_Id] [int],
[Date_Id] [datetime],
[High] [float],
[Low] [float],
[Amount] [float]
)

The WriteTable may be created like this:

CREATE TABLE [WriteTable_Indicator Amount](
[High_0] [float],
[Low_1] [float],
[Amount_2] [float],
[Indicator_Id_3] [int],
[Region_Id_4] [int],
[Scenario_Id_5] [int],
[Date_Id_6] [datetime],
[MS_AUDIT_TIME_8] [datetime],
[MS_AUDIT_USER_9] [nvarchar](255)
)

Note how the column names are the same as the fact table column names, but are suffixed with _1, _2, etc. Unfortunately, these may change with the re-creation of the WriteTable. SSAS tends to assign the suffixes randomly. If that happens, our consolidation stored procedures will break.

The obvious step to avoid this is to set up our cube processing correctly, making sure that the WriteTable does not get re-created. To do this, we can select Use Existing writeback table in the Change Settings… dialog, which allows us to change cube processing settings:

image

We can also script this action and use it in our automated cube processing SQL Server job.

Even though this is a relatively intuitive and simple solution, I have always had problems with it because of manual cube processing performed by power users, which do destroy the writeback data together with the WriteTable structure and following from that, the code in my stored procedures.

Through the utilisation of some dynamic SQL and SQL Server system tables information, we can write a stored procedure which does not depend on the suffixes of the column names in the writeback table:

CREATE PROCEDURE [usp_Consolidate_WriteBack_to_Facts]
AS
BEGIN
SET NOCOUNT ON;

DECLARE @Column_High nvarchar(50),
@Column_Low nvarchar(50),
@Column_Amount nvarchar(50),
@Column_Indicator nvarchar(50),
@Column_Region nvarchar(50),
@Column_Scenario nvarchar(50),
@Column_Time nvarchar(50)

SET @Column_High = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘High%’
)

SET @Column_Low = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Low%’
)

SET @Column_Amount = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Amount%’
)

SET @Column_Indicator = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Indicator%’
)

SET @Column_Region = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Region%’
)

SET @Column_Scenario = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
INNER JOIN systypes
ON syscolumns.xtype=systypes.xtype
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Scenario%’
)

SET @Column_Time = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Date%’
)

DECLARE @SQL_Command nvarchar(4000)
SET @SQL_Command = (‘
INSERT INTO [Fact_IndicatorAmount]
([High]
,[Low]
,[Amount]
,[Indicator_Id]
,[Region_Id]
,[Scenario_Id]
,[Date_Id])
SELECT ‘+ @Column_High +’
,’+ @Column_Low +’
,’+ @Column_Amount +’
,’+ @Column_Indicator +’
,’+ @Column_Region +’
,’+ @Column_Scenario +’
,’+ @Column_Time +’
FROM [WriteTable_Indicator Amount]’)

EXEC (@SQL_Command)

TRUNCATE TABLE [WriteTable_Indicator Amount]
END

What we are effectively doing here is getting the column names from the WriteTable and then constructing an INSERT statement based on these. It is dangerous to further automate this by a while loop, as the actual column names in the WriteTable can differ from the ones in the fact table. This could happen if the dimension table key names are different to the fact table key names.

Moving writeback rows through this stored procedure ensures that even if the WriteTable for a partition is re-created for some reason our code can handle it.

Check your spelling in SSAS 2008 MDX queries

I just found out that because of wrong/different spelling of a dimension name/attribute in SSAS 2008 the server may return the following message:
The set must have a single hierarchy to be used with the complement operator.

To fix the issue, we must ensure that all the dimensions in our queries are spelled exactly the way they are named in the cube (surprise?)…

I encountered this message by first building a Reporting Services report with data coming from SSAS. Then, a colleague misspelt a dimension name and instead of Natural Code he typed Natrual Code. Next time I needed to run the query in Query Designer I got the message. The query indeed used a complement operator on that dimension, exculding a particular account by its Natural Code. The query was something like:

SELECT {
[Measures].[Amount]
} ON COLULMNS,
{
([Account].[Major Account].ALLMEMBERS *
[Employee].[Employee Name].ALLMEMBERS)
} ON ROWS
FROM (
SELECT (-{[Account].[Natural Code].&[990]}) ON COLUMNS
FROM [Finance]
)