Sneak Peak at my session at Microsoft Tech Ed 2010 Australia

Without revealing too much, if you turn up for the “Budgeting, Planning and Performance Management with the Microsoft BI Stack” on this year’s Tech Ed on the Gold Coast you can expect to see the following:

  1. SharePoint 2010 + PerformancePoint Services + Excel Services in action
  2. Excel 2010 (What-If Analysis in Pivot Tables sourced from write-enabled SSAS cubes)
  3. An enterprise model showing some concepts and ideas
  4. New and enhanced features of the 2010 releases of SharePoint and Excel

Some background knowledge of the Microsoft BI Stack is very much preferable as it can get a bit complex at times, however if you are wondering how Microsoft Business Intelligence can deliver on the keywords in the presentation title you will definitely see some ideas.

  • Seasoned BI professionals will see some new features
  • Inexperienced BI professionals will see a lot of new features and an overall solution architecture which may help them to get further in the area
  • Other IT professionals will see how Microsoft BI is on the way of making their daily jobs obsolete in some areas
  • Business people (especially ones involved in planning and performance management in their organisation) will see how Microsoft can enhance/simplify/make exciting their everyday lives
  • An especially strong warning for senior executives: You may feel like you are currently missing out and experience a sudden urge to allocate more funds to your IT department
Advertisements

Post-mortem thoughts on PerformancePoint Planning and the future of Microsoft BI planning and forecasting

As everyone must be well aware by now, Microsoft has discontinued PerformancePoint Planning and has decided to merge Monitoring & Analytics into SharePoint as PerofrmancePoint Services. Chris Webb and Nick Barclay already blogged about this and gave us some valuable thoughts/explanations on the subject.

In addition to what has already been said, I would like to add that maybe dumping Planning will not be such a great loss to anyone. The current market penetration is marginal and the successful implementations are not that many anyway. I have seen companies considering PP Planning and then abandoning it because of the complexities involved which translate directly into a high implementation cost comparable to a .NET implementation.

From a simplistic technological point of view, planning and forecasting is allowing users to input data, then manipulating the data according to some business rules, and then adding it to their BI systems in order to analyse and compare things from the past to things in the future. Currently, we can do this by either building a custom application, which handles all this, or we can use a third-party application handling it for us. I have had the chance to be involved in each scenario (once with a team of .NET developers and a few times with Calumo, which allows cube write-back or stored procedure write-back from Excel). The difficulties always come from the fact that the knowledge needed to accurately gather requirements, obscured by layers of business logic, and the completely different nature of planning and forecasting in comparison with analytics.

Analytics, or core BI is based on the presumption that we already have reliable data, of which our business clients want to make sense, thus gaining insight into their business. Planning and forecasting, in contrast, also involves allowing these same users to record their thoughts about the business and their projections about the future, and then analyse those just like their historical data. Therefore, planning and forecasting is more complex than pure analytics.

There is no tool in the Microsoft BI stack which can completely cover the requirements for a typical  business planning scenario. PerformancePoint Planning tried to encapsulate the planning logic into a set of models and rules, but it was too complex for both users and developers to understand, implement and then maintain. I have seen a number of successful planning and forecasting implementations with a third-party tool – Calumo. It is a fairly simple application (at least in comparison to PP Planning), which apart from some quite handy improvements over Excel for report authoring, has the very powerful functionality to allow users to input data straight back to their OLAP source (cube write-back), or to their data warehouse (stored procedure write-back). That is all that is needed for any planning and forecasting project and Micorosft should really have a look at what their partners are offering as a solution to planning instead of developing monstrosities like PerformancePoint Planning.

Building on top of SQL Server stored procedures and Analysis Services write-back, Microsoft can easily enhance their BI offering. All we need is a way to access this functionality through a front-end tool like Excel, SharePoint or Reporting Services.

Note: A couple of techniques which may be useful for planning and forecasting implementations are discussed in these posts:
Spreading Non-Transactional Data Over Time
Moving writeback data in the Fact tables and avoiding problems with changing column names

Moving writeback data in the Fact tables and avoiding problems with changing column names

While writeback functionality in SQL Server Analysis Services 2008 has changed significantly and writeback values are stored in the OLAP cubes, in SSAS 2005 the writeback values are stored in a relational table on the same server with the fact tables. When the writeback functionality is enabled for a partition, a new table is automatically created which bears a prefix of WriteTable. Its structure is fairly simple: it contains a column for each dimension and two audit fields.

The ROLAP nature of the writeback table makes it inefficient for storage of a large number of writeback records, and it is sometimes required to consolidate the data it contains with the fact table.

Normally we can write a stored procedure, which can do this for us. Because the values in the WriteTable are deltas there is a new row for each user change. In example, if we change 0 to 5, there will be one row in the writeback table, which shows 5 as a measure value. If then we change the new value of 5 to 2, there will be a new row with a measure value of -3. Therefore, it could be more efficient to perform a quick aggregation of the values in the WriteTable while moving them in the fact table. This could also be contrary to our requirements if we want to be able to trace all data changes.

In either case, we end up with a number of new rows and we can insert these into our fact table, after which we can truncate our WriteTable and process our cube. There is a potential pitfall here. If we do not set up properly the processing settings, we could destroy our WriteTable and have it re-created, which in turn introduces another pitfall – SSAS may change our column suffixes. In example, if we have a fact table with the following definition:

CREATE TABLE [Fact_IndicatorAmount](
[Fact_IndicatorAmount_Id] [int],
[ETL_Date] [timestamp],
[Indicator_Id] [int],
[Region_Id] [int],
[Scenario_Id] [int],
[Date_Id] [datetime],
[High] [float],
[Low] [float],
[Amount] [float]
)

The WriteTable may be created like this:

CREATE TABLE [WriteTable_Indicator Amount](
[High_0] [float],
[Low_1] [float],
[Amount_2] [float],
[Indicator_Id_3] [int],
[Region_Id_4] [int],
[Scenario_Id_5] [int],
[Date_Id_6] [datetime],
[MS_AUDIT_TIME_8] [datetime],
[MS_AUDIT_USER_9] [nvarchar](255)
)

Note how the column names are the same as the fact table column names, but are suffixed with _1, _2, etc. Unfortunately, these may change with the re-creation of the WriteTable. SSAS tends to assign the suffixes randomly. If that happens, our consolidation stored procedures will break.

The obvious step to avoid this is to set up our cube processing correctly, making sure that the WriteTable does not get re-created. To do this, we can select Use Existing writeback table in the Change Settings… dialog, which allows us to change cube processing settings:

image

We can also script this action and use it in our automated cube processing SQL Server job.

Even though this is a relatively intuitive and simple solution, I have always had problems with it because of manual cube processing performed by power users, which do destroy the writeback data together with the WriteTable structure and following from that, the code in my stored procedures.

Through the utilisation of some dynamic SQL and SQL Server system tables information, we can write a stored procedure which does not depend on the suffixes of the column names in the writeback table:

CREATE PROCEDURE [usp_Consolidate_WriteBack_to_Facts]
AS
BEGIN
SET NOCOUNT ON;

DECLARE @Column_High nvarchar(50),
@Column_Low nvarchar(50),
@Column_Amount nvarchar(50),
@Column_Indicator nvarchar(50),
@Column_Region nvarchar(50),
@Column_Scenario nvarchar(50),
@Column_Time nvarchar(50)

SET @Column_High = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘High%’
)

SET @Column_Low = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Low%’
)

SET @Column_Amount = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Amount%’
)

SET @Column_Indicator = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Indicator%’
)

SET @Column_Region = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Region%’
)

SET @Column_Scenario = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
INNER JOIN systypes
ON syscolumns.xtype=systypes.xtype
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Scenario%’
)

SET @Column_Time = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Date%’
)

DECLARE @SQL_Command nvarchar(4000)
SET @SQL_Command = (‘
INSERT INTO [Fact_IndicatorAmount]
([High]
,[Low]
,[Amount]
,[Indicator_Id]
,[Region_Id]
,[Scenario_Id]
,[Date_Id])
SELECT ‘+ @Column_High +’
,’+ @Column_Low +’
,’+ @Column_Amount +’
,’+ @Column_Indicator +’
,’+ @Column_Region +’
,’+ @Column_Scenario +’
,’+ @Column_Time +’
FROM [WriteTable_Indicator Amount]’)

EXEC (@SQL_Command)

TRUNCATE TABLE [WriteTable_Indicator Amount]
END

What we are effectively doing here is getting the column names from the WriteTable and then constructing an INSERT statement based on these. It is dangerous to further automate this by a while loop, as the actual column names in the WriteTable can differ from the ones in the fact table. This could happen if the dimension table key names are different to the fact table key names.

Moving writeback rows through this stored procedure ensures that even if the WriteTable for a partition is re-created for some reason our code can handle it.

Spreading Non-Transactional Data Along Time

In some cases we need to be able to analyse non-transactional data for discrete periods along a time dimension. An example of such a case is a collection of invoices, which have start and end dates for a period, but are not otherwise connected to a time axis. We may have such invoices with these properties:

Invoice Id
Start Date
End Date
Amount

One of the invoices may be:

Invoice Id: 34821432
Start Date: 2008-10-15
End Date: 2009-03-14
Amount: 15,000.00

and another one:

Invoice Id: 34934221
Start Date: 2008-12-01
End Date: 2009-05-30
Amount: 6,500.00

If the company we are building this for is daily deducting a fee for its services (e.g. funds management, software maintenance, etc.), we may have to be able to spread the amount in smaller periods, like months or days and then aggregate the smaller amounts along a time dimension.

To do this we have to first store the data in a relational table and then write some SQL to do the trick for us.

First, we should create a table valued function which returns all the dates at a specified granularity, such as days, from the Start to the End dates and the count of all the periods in between (in our case is is a count of days):

CREATE FUNCTION udf_Create_Daily_Date_Spread
(   
      @Start_Date datetime
    , @End_Date datetime
)
RETURNS @Daily_Spread TABLE (
      Date_Id datetime
    , Count_Of_Days int
)
AS
BEGIN
    DECLARE @Count int
    SET @Count = 0

    IF @Start_Date >= @End_Date
        RETURN

    WHILE @Start_Date <= @End_Date
    BEGIN
        INSERT INTO @Daily_Spread(Date_Id)
        SELECT @Start_Date

        SET @Start_Date = DATEADD(d, 1,@Start_Date)
        SET @Count = @Count + 1
    END

    UPDATE @Daily_Spread
    SET   Count_Of_Days = @Count

    RETURN
END

After having created these functions, we can use the CROSS APPLY statement to create the even spread:

SELECT             Invoice_Id
                        ,Start_Date
                        ,End_Date
                        ,cdds.Date_Id
                        ,Amount/cdds.Count_Of_Days
FROM Invoice_Source inv
CROSS APPLY udf_Create_Daily_Date_Spread(inv.Start_Date, inv.End_Date) cdds

After running the sample data through this code, we will get an even spread for both invoices and we will be able to attach a time dimension to them.

Even though the data size may explode after such a manipulation, Analysis Services provides an excellent way of handling even the largest sets of data. If storage is a problem, we can always choose to break down our data in less periods – instead of days, weeks or months.