December 2008 – Boyan Penev on Microsoft BI

Reporting Designer in BIDS 2008 Column Visibility Bug

A small issue with BIDS I just found out about – if you want to modify the visibility expression of a column in Business Intelligence Development Studio 2008 you may find that the Fields list is empty and that Report Designer cannot see a Dataset binding for the column. Regardless of that, if you type in a Fields!<field_name> expression and you get a red curvy line indicating something is wrong with it, you will be able to deploy the report and the visibility of the column will get toggled by the expression. So it actually works but also could be annoying.

Note (2009-02-07): This appears if you have a tablix within a tablix and the outer tablix does not have a dataset associated with it.

Changes in SQL Server 2008 sysadmin group

There are some noteworthy changes in the way SQL Server 2008 handles security; apart from the single major improvement – the replacement of the Surface Area Configuration tool by Policy-Based Management.

One thing that surprised me today was that even though I was a Domain Admin and a member of the local Administrators group, SQL Server 2008 refused to let me log in. A login had to be explicitly created so I could access the instance. After some research, I found out that in SQL Server 2008 the local Windows administrators do not get mapped to the sysadmin role. Therefore, it is possible to get locked out of a server instance if there are no sysadmins on it. This is a feature, which separates more clearly SQL Server admins and Windows admins.

A further note on this topic. I would have not lost a small SQL Server war on a Dev environment recently if we were using SQL Server 2008 instead of SQL Server 2005. Now, being a Domain Admin does not necessarily win the battle for SQL Server permissions.

There is a TechNet page describing SQL Server 2008 Security Changes for further reference.

And another one, helping in case all system administrators are locked out.

Spreading Non-Transactional Data Along Time

In some cases we need to be able to analyse non-transactional data for discrete periods along a time dimension. An example of such a case is a collection of invoices, which have start and end dates for a period, but are not otherwise connected to a time axis. We may have such invoices with these properties:

Invoice Id
Start Date
End Date
Amount

One of the invoices may be:

Invoice Id: 34821432
Start Date: 2008-10-15
End Date: 2009-03-14
Amount: 15,000.00

and another one:

Invoice Id: 34934221
Start Date: 2008-12-01
End Date: 2009-05-30
Amount: 6,500.00

If the company we are building this for is daily deducting a fee for its services (e.g. funds management, software maintenance, etc.), we may have to be able to spread the amount in smaller periods, like months or days and then aggregate the smaller amounts along a time dimension.

To do this we have to first store the data in a relational table and then write some SQL to do the trick for us.

First, we should create a table valued function which returns all the dates at a specified granularity, such as days, from the Start to the End dates and the count of all the periods in between (in our case is is a count of days):

CREATE FUNCTION udf_Create_Daily_Date_Spread
(
      @Start_Date datetime
    , @End_Date datetime
)
RETURNS @Daily_Spread TABLE (
      Date_Id datetime
    , Count_Of_Days int
)
AS
BEGIN
    DECLARE @Count int
    SET @Count = 0

    IF @Start_Date >= @End_Date
        RETURN

    WHILE @Start_Date <= @End_Date
    BEGIN
        INSERT INTO @Daily_Spread(Date_Id)
        SELECT @Start_Date

        SET @Start_Date = DATEADD(d, 1,@Start_Date)
        SET @Count = @Count + 1
    END

    UPDATE @Daily_Spread
    SET   Count_Of_Days = @Count

    RETURN
END

After having created these functions, we can use the CROSS APPLY statement to create the even spread:

SELECT             Invoice_Id
                        ,Start_Date
                        ,End_Date
                        ,cdds.Date_Id
                        ,Amount/cdds.Count_Of_Days
FROM Invoice_Source inv
CROSS APPLY udf_Create_Daily_Date_Spread(inv.Start_Date, inv.End_Date) cdds

After running the sample data through this code, we will get an even spread for both invoices and we will be able to attach a time dimension to them.

Even though the data size may explode after such a manipulation, Analysis Services provides an excellent way of handling even the largest sets of data. If storage is a problem, we can always choose to break down our data in less periods – instead of days, weeks or months.

Pre-Aggregated Data and Changing Dimensions

Normally when working with data we assume that we have it on the lowest possible grain and we are challenged by the need to aggregate it along various hierarchies. By the nature of BI, we get to work with large sets of detailed data collected by an existing system.

But what happens when we are given the aggregations and we need to work with these instead? Recently Nick Barclay and I were involved in such an implementation. Nick designed a Health and Safety Dashboard and after that I built it with Reporting Services, PerformancePoint, Analysis Services and SQL Server 2005 relational databases. We were told in the beginning of the project that the aggregation business rules are so complicated that rebuilding them for the purposes of the dashboard was way out of scope. I had not had experience with pre-aggregated data and I did not foresee a major problem, which became apparent after a few months of development when the Business Unit hierarchy changed significantly.

Aggregation Rules

In contrast with typical business scenarios when an SSAS function like Sum and some custom roll-ups works perfectly well, the Health and Safety data needed to be aggregated in a complex way and the organisation we built the dashboard for had already invested in a system managing these aggregations. In example, a simple rule would be – if we have more than 1 major incidents in a Business Unit, it gets an Amber score for Safety Compliance. If it has more than 4, it becomes Red. In turn, its parent is the same – if the sum of all major incidents for its descendants is greater than 1, it becomes Amber and with more than 4 – Red. There were also quite a few dependencies between various dimensions and reluctantly we agreed to work with the data the way it was. The following diagram shows the way an aggregate for Major Incidents works:

Japan doing badly on Major Incidents also makes Asia and Global look bad.

The problem

The actual problem comes from the fact that our dimension hierarchy can change and if it does the aggregations do not make sense historically. We could have a change in our hierarchy:

and we may need to generate the following simple report for before and after the change:

The historical data suggests that something was wrong with China as the only child of Asia before the change in hierarchy, while in fact, the culprit then (and now) was Japan. As we would not be able to see how the hierarchy looked before, we would not be able to analyse the data we have accordingly. Also, we cannot compare data for Asia after the change to data for Asia from before the change along the new hierarchy.

Possible solutions

In my project, the client actually agreed that it is alright to have this handicap in the system and a possible solution was a complete historical rebuild of the aggregations followed by a complete reload of the system data after every hierarchy change. Lucky for us, the data set was not too big and was not expected to grow too much.

Another solution would be to use a Slowly Changing Dimension, so we can show the hierarchy as it was when the aggregations were made – this would improve the analytical value of the data because business users would be able to see why a Business Unit like Asia was Red in the context of an outdated hierarchy.

The best solution would be to build our own aggregations and work with those, as then we are gaining the ultimate flexibility to represent data in the exact way the business users need it, but unfortunately sometimes it is very hard to convince them that spending twice the funds on the same problem is a good thing.