January 2009 – Boyan Penev on Microsoft BI

Post-mortem thoughts on PerformancePoint Planning and the future of Microsoft BI planning and forecasting

As everyone must be well aware by now, Microsoft has discontinued PerformancePoint Planning and has decided to merge Monitoring & Analytics into SharePoint as PerofrmancePoint Services. Chris Webb and Nick Barclay already blogged about this and gave us some valuable thoughts/explanations on the subject.

In addition to what has already been said, I would like to add that maybe dumping Planning will not be such a great loss to anyone. The current market penetration is marginal and the successful implementations are not that many anyway. I have seen companies considering PP Planning and then abandoning it because of the complexities involved which translate directly into a high implementation cost comparable to a .NET implementation.

From a simplistic technological point of view, planning and forecasting is allowing users to input data, then manipulating the data according to some business rules, and then adding it to their BI systems in order to analyse and compare things from the past to things in the future. Currently, we can do this by either building a custom application, which handles all this, or we can use a third-party application handling it for us. I have had the chance to be involved in each scenario (once with a team of .NET developers and a few times with Calumo, which allows cube write-back or stored procedure write-back from Excel). The difficulties always come from the fact that the knowledge needed to accurately gather requirements, obscured by layers of business logic, and the completely different nature of planning and forecasting in comparison with analytics.

Analytics, or core BI is based on the presumption that we already have reliable data, of which our business clients want to make sense, thus gaining insight into their business. Planning and forecasting, in contrast, also involves allowing these same users to record their thoughts about the business and their projections about the future, and then analyse those just like their historical data. Therefore, planning and forecasting is more complex than pure analytics.

There is no tool in the Microsoft BI stack which can completely cover the requirements for a typical business planning scenario. PerformancePoint Planning tried to encapsulate the planning logic into a set of models and rules, but it was too complex for both users and developers to understand, implement and then maintain. I have seen a number of successful planning and forecasting implementations with a third-party tool – Calumo. It is a fairly simple application (at least in comparison to PP Planning), which apart from some quite handy improvements over Excel for report authoring, has the very powerful functionality to allow users to input data straight back to their OLAP source (cube write-back), or to their data warehouse (stored procedure write-back). That is all that is needed for any planning and forecasting project and Micorosft should really have a look at what their partners are offering as a solution to planning instead of developing monstrosities like PerformancePoint Planning.

Building on top of SQL Server stored procedures and Analysis Services write-back, Microsoft can easily enhance their BI offering. All we need is a way to access this functionality through a front-end tool like Excel, SharePoint or Reporting Services.

Note: A couple of techniques which may be useful for planning and forecasting implementations are discussed in these posts:
Spreading Non-Transactional Data Over Time
Moving writeback data in the Fact tables and avoiding problems with changing column names

Speed up MDX query preparation in Reporting Services

We often need to build some advanced functionality into our reports and since the Reporting Services query designer does not provide a very advanced way to create MDX statements, our only option is to get our hands dirty and write some code ourselves. One of the things we must consider is the time that the query designer needs to validate our syntax, execute the query and determine what fields to be created based on the data set it receives from Analysis Services.

When we specify a number of query parameters, we have the option to give them default values:

If we do not do this, the query will be executed with the default dimension member instead of the a specific value as a parameter, which can be slow and unnecessary in the usual case where the default member is the (All) member. Instead, we should always specify a value which is to be used for limiting the sample data set. Ideally, is should be small but not empty. If the parameter is too limiting and the MDX query does not return any values the data fields will not be populated.

Specifying default values speeds up development time, especially when using complex cubes, where Reporting Services queries can be slow as its dimension cross joins can be very big.

Chris Webb also blogged in regards to this issue. I strongly recommend you read his post.

Moving writeback data in the Fact tables and avoiding problems with changing column names

While writeback functionality in SQL Server Analysis Services 2008 has changed significantly and writeback values are stored in the OLAP cubes, in SSAS 2005 the writeback values are stored in a relational table on the same server with the fact tables. When the writeback functionality is enabled for a partition, a new table is automatically created which bears a prefix of WriteTable. Its structure is fairly simple: it contains a column for each dimension and two audit fields.

The ROLAP nature of the writeback table makes it inefficient for storage of a large number of writeback records, and it is sometimes required to consolidate the data it contains with the fact table.

Normally we can write a stored procedure, which can do this for us. Because the values in the WriteTable are deltas there is a new row for each user change. In example, if we change 0 to 5, there will be one row in the writeback table, which shows 5 as a measure value. If then we change the new value of 5 to 2, there will be a new row with a measure value of -3. Therefore, it could be more efficient to perform a quick aggregation of the values in the WriteTable while moving them in the fact table. This could also be contrary to our requirements if we want to be able to trace all data changes.

In either case, we end up with a number of new rows and we can insert these into our fact table, after which we can truncate our WriteTable and process our cube. There is a potential pitfall here. If we do not set up properly the processing settings, we could destroy our WriteTable and have it re-created, which in turn introduces another pitfall – SSAS may change our column suffixes. In example, if we have a fact table with the following definition:

CREATE TABLE [Fact_IndicatorAmount](
[Fact_IndicatorAmount_Id] [int],
[ETL_Date] [timestamp],
[Indicator_Id] [int],
[Region_Id] [int],
[Scenario_Id] [int],
[Date_Id] [datetime],
[High] [float],
[Low] [float],
[Amount] [float]
)

The WriteTable may be created like this:

CREATE TABLE [WriteTable_Indicator Amount](
[High_0] [float],
[Low_1] [float],
[Amount_2] [float],
[Indicator_Id_3] [int],
[Region_Id_4] [int],
[Scenario_Id_5] [int],
[Date_Id_6] [datetime],
[MS_AUDIT_TIME_8] [datetime],
[MS_AUDIT_USER_9] [nvarchar](255)
)

Note how the column names are the same as the fact table column names, but are suffixed with _1, _2, etc. Unfortunately, these may change with the re-creation of the WriteTable. SSAS tends to assign the suffixes randomly. If that happens, our consolidation stored procedures will break.

The obvious step to avoid this is to set up our cube processing correctly, making sure that the WriteTable does not get re-created. To do this, we can select Use Existing writeback table in the Change Settings… dialog, which allows us to change cube processing settings:

We can also script this action and use it in our automated cube processing SQL Server job.

Even though this is a relatively intuitive and simple solution, I have always had problems with it because of manual cube processing performed by power users, which do destroy the writeback data together with the WriteTable structure and following from that, the code in my stored procedures.

Through the utilisation of some dynamic SQL and SQL Server system tables information, we can write a stored procedure which does not depend on the suffixes of the column names in the writeback table:

CREATE PROCEDURE [usp_Consolidate_WriteBack_to_Facts]
AS
BEGIN
SET NOCOUNT ON;

DECLARE @Column_High nvarchar(50),
@Column_Low nvarchar(50),
@Column_Amount nvarchar(50),
@Column_Indicator nvarchar(50),
@Column_Region nvarchar(50),
@Column_Scenario nvarchar(50),
@Column_Time nvarchar(50)

SET @Column_High = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘High%’
)

SET @Column_Low = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Low%’
)

SET @Column_Amount = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Amount%’
)

SET @Column_Indicator = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Indicator%’
)

SET @Column_Region = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Region%’
)

SET @Column_Scenario = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
INNER JOIN systypes
ON syscolumns.xtype=systypes.xtype
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Scenario%’
)

SET @Column_Time = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Date%’
)

DECLARE @SQL_Command nvarchar(4000)
SET @SQL_Command = (‘
INSERT INTO [Fact_IndicatorAmount]
([High]
,[Low]
,[Amount]
,[Indicator_Id]
,[Region_Id]
,[Scenario_Id]
,[Date_Id])
SELECT ‘+ @Column_High +’
,’+ @Column_Low +’
,’+ @Column_Amount +’
,’+ @Column_Indicator +’
,’+ @Column_Region +’
,’+ @Column_Scenario +’
,’+ @Column_Time +’
FROM [WriteTable_Indicator Amount]’)

EXEC (@SQL_Command)

TRUNCATE TABLE [WriteTable_Indicator Amount]
END

What we are effectively doing here is getting the column names from the WriteTable and then constructing an INSERT statement based on these. It is dangerous to further automate this by a while loop, as the actual column names in the WriteTable can differ from the ones in the fact table. This could happen if the dimension table key names are different to the fact table key names.

Moving writeback rows through this stored procedure ensures that even if the WriteTable for a partition is re-created for some reason our code can handle it.

Check your spelling in SSAS 2008 MDX queries

I just found out that because of wrong/different spelling of a dimension name/attribute in SSAS 2008 the server may return the following message:

The set must have a single hierarchy to be used with the complement operator.

To fix the issue, we must ensure that all the dimensions in our queries are spelled exactly the way they are named in the cube (surprise?)…

I encountered this message by first building a Reporting Services report with data coming from SSAS. Then, a colleague misspelt a dimension name and instead of Natural Code he typed Natrual Code. Next time I needed to run the query in Query Designer I got the message. The query indeed used a complement operator on that dimension, exculding a particular account by its Natural Code. The query was something like:

SELECT {

[Measures].[Amount]

} ON COLULMNS,

{

([Account].[Major Account].ALLMEMBERS *

[Employee].[Employee Name].ALLMEMBERS)

} ON ROWS

FROM (

SELECT (-{[Account].[Natural Code].&[990]}) ON COLUMNS

FROM [Finance]

)