Reporting Designer in BIDS 2008 Column Visibility Bug

A small issue with BIDS I just found out about – if you want to modify the visibility expression of a column in Business Intelligence Development Studio 2008 you may find that the Fields list is empty and that Report Designer cannot see a Dataset binding for the column. Regardless of that, if you type in a Fields!<field_name> expression and you get a red curvy line indicating something is wrong with it, you will be able to deploy the report and the visibility of the column will get toggled by the expression. So it actually works but also could be annoying.

Note (2009-02-07): This appears if you have a tablix within a tablix and the outer tablix does not have a dataset associated with it.

Changes in SQL Server 2008 sysadmin group

There are some noteworthy changes in the way SQL Server 2008 handles security; apart from the single major improvement – the replacement of the Surface Area Configuration tool by Policy-Based Management.

One thing that surprised me today was that even though I was a Domain Admin and a member of the local Administrators group, SQL Server 2008 refused to let me log in. A login had to be explicitly created so I could access the instance. After some research, I found out that in SQL Server 2008 the local Windows administrators do not get mapped to the sysadmin role. Therefore, it is possible to get locked out of a server instance if there are no sysadmins on it. This is a feature, which separates more clearly SQL Server admins and Windows admins.

A further note on this topic. I would have not lost a small SQL Server war on a Dev environment recently if we were using SQL Server 2008 instead of SQL Server 2005. Now, being a Domain Admin does not necessarily win the battle for SQL Server permissions.

There is a TechNet page describing SQL Server 2008 Security Changes for further reference.

And another one, helping in case all system administrators are locked out.

Spreading Non-Transactional Data Along Time

In some cases we need to be able to analyse non-transactional data for discrete periods along a time dimension. An example of such a case is a collection of invoices, which have start and end dates for a period, but are not otherwise connected to a time axis. We may have such invoices with these properties:

Invoice Id
Start Date
End Date

One of the invoices may be:

Invoice Id: 34821432
Start Date: 2008-10-15
End Date: 2009-03-14
Amount: 15,000.00

and another one:

Invoice Id: 34934221
Start Date: 2008-12-01
End Date: 2009-05-30
Amount: 6,500.00

If the company we are building this for is daily deducting a fee for its services (e.g. funds management, software maintenance, etc.), we may have to be able to spread the amount in smaller periods, like months or days and then aggregate the smaller amounts along a time dimension.

To do this we have to first store the data in a relational table and then write some SQL to do the trick for us.

First, we should create a table valued function which returns all the dates at a specified granularity, such as days, from the Start to the End dates and the count of all the periods in between (in our case is is a count of days):

CREATE FUNCTION udf_Create_Daily_Date_Spread
      @Start_Date datetime
    , @End_Date datetime
RETURNS @Daily_Spread TABLE (
      Date_Id datetime
    , Count_Of_Days int
    DECLARE @Count int
    SET @Count = 0

    IF @Start_Date >= @End_Date

    WHILE @Start_Date <= @End_Date
        INSERT INTO @Daily_Spread(Date_Id)
        SELECT @Start_Date

        SET @Start_Date = DATEADD(d, 1,@Start_Date)
        SET @Count = @Count + 1

    UPDATE @Daily_Spread
    SET   Count_Of_Days = @Count


After having created these functions, we can use the CROSS APPLY statement to create the even spread:

SELECT             Invoice_Id
FROM Invoice_Source inv
CROSS APPLY udf_Create_Daily_Date_Spread(inv.Start_Date, inv.End_Date) cdds

After running the sample data through this code, we will get an even spread for both invoices and we will be able to attach a time dimension to them.

Even though the data size may explode after such a manipulation, Analysis Services provides an excellent way of handling even the largest sets of data. If storage is a problem, we can always choose to break down our data in less periods – instead of days, weeks or months.

Pre-Aggregated Data and Changing Dimensions

Normally when working with data we assume that we have it on the lowest possible grain and we are challenged by the need to aggregate it along various hierarchies. By the nature of BI, we get to work with large sets of detailed data collected by an existing system.

But what happens when we are given the aggregations and we need to work with these instead? Recently Nick Barclay and I were involved in such an implementation. Nick designed a Health and Safety Dashboard and after that I built it with Reporting Services, PerformancePoint, Analysis Services and SQL Server 2005 relational databases. We were told in the beginning of the project that the aggregation business rules are so complicated that rebuilding them for the purposes of the dashboard was way out of scope. I had not had experience with pre-aggregated data and I did not foresee a major problem, which became apparent after a few months of development when the Business Unit hierarchy changed significantly.

Aggregation Rules

In contrast with typical business scenarios when an SSAS function like Sum and some custom roll-ups works perfectly well, the Health and Safety data needed to be aggregated in a complex way and the organisation we built the dashboard for had already invested in a system managing these aggregations. In example, a simple rule would be – if we have more than 1 major incidents in a Business Unit, it gets an Amber score for Safety Compliance. If it has more than 4, it becomes Red. In turn, its parent is the same – if the sum of all major incidents for its descendants is greater than 1, it becomes Amber and with more than 4 – Red. There were also quite a few dependencies between various dimensions and reluctantly we agreed to work with the data the way it was. The following diagram shows the way an aggregate for Major Incidents works:

Major Incidents Aggregation 

Japan doing badly on Major Incidents also makes Asia and Global look bad.

The problem

The actual problem comes from the fact that our dimension hierarchy can change and if it does the aggregations do not make sense historically. We could have a change in our hierarchy:


and we may need to generate the following simple report for before and after the change:


The historical data suggests that something was wrong with China as the only child of Asia before the change in hierarchy, while in fact, the culprit then (and now) was Japan. As we would not be able to see how the hierarchy looked before, we would not be able to analyse the data we have accordingly. Also, we cannot compare data for Asia after the change to data for Asia from before the change along the new hierarchy.

Possible solutions

In my project, the client actually agreed that it is alright to have this handicap in the system and a possible solution was a complete historical rebuild of the aggregations followed by a complete reload of the system data after every hierarchy change. Lucky for us, the data set was not too big and was not expected to grow too much.

Another solution would be to use a Slowly Changing Dimension, so we can show the hierarchy as it was when the aggregations were made – this would improve the analytical value of the data because business users would be able to see why a Business Unit like Asia was Red in the context of an outdated hierarchy.

The best solution would be to build our own aggregations and work with those, as then we are gaining the ultimate flexibility to represent data in the exact way the business users need it, but unfortunately sometimes it is very hard to convince them that spending twice the funds on the same problem is a good thing.

Ordering Dimensions: Recursion vs Loops in T-SQL

Recently I had to build a function, which returned dimension members from a parent-child relational table. I needed to be able to order the members hierarchically and I decided to use a recursive CTE call.
Usually, we have a parent-child dimension table with structure similar to this:
Business_Skey int
Parent_Business_Skey int
with a root node with Skey of -3 and Parent_Business_Skey of -3.
Let’s assume we have a hierarchy like this:
All (Id = -3, P_ID = -3)
-GLOBAL (Id = 1, P_ID = -3)
–Europe (Id = 2, P_ID = 1)
—UK (Id = 3, P_ID = 2)
—France (Id = 4, P_ID = 2)
—Spain (Id = 5, P_ID = 2)
–North America (Id = 6, P_ID = 1)
—USA (Id = 7, P_ID = 6)
—Canada (Id = 8, P_ID = 6)
—Mexico (Id = 9, P_ID = 6)
If we do something like:
SELECT Business_Skey
FROM Dim_Business
ORDER BY Parent_Business_Skey ASC, Business_Skey ASC
We will get:
-3 (All)
2 (Europe)
6 (North America)
3 (UK)
4 (France)
5 (Spain)
7 (USA)
8 (Canada)
9 (Mexico)
Obviously, this hierarchy is incorrect, because we want to see the leaf nodes under their respective parents.
We can recursively create order, which concatenates the parent ids:
WITH b_ord(bid, bord)
SELECT  Business_Skey AS bid
 , CONVERT(nvarchar(1000), Business_Skey) AS bord
FROM Dim_Business
WHERE Business_Skey = -3


SELECT  Business_Skey AS bid
 , CONVERT(nvarchar(1000), bord + ‘|’ + CONVERT(nvarchar, Business_Skey))
FROM Dim_Business db
 INNER JOIN b_ord bo
 ON db.Parent_Business_Skey =
WHERE db.Business_Skey <>
FROM b_ord
The result of the CTE query is:
-3 -3 (All)
1 -3|1 (GLOBAL)
2 -3|1|2 (Europe)
3 -3|1|2|3 (UK)
4 -3|1|2|4 (France)
5 -3|1|2|5 (Spain)
6 -3|1|6 (North America)
7 -3|1|6|7 (USA)
8 -3|1|6|8 (Canada)
9 -3|1|6|9 (Mexico)
and the order is correct.
Because the code needed to go in a function, invoked by a number of stored procedures, .NET application and various reports, I needed the code to be quick and 
light. As some dimensions had a large number of members (50000+), which could grow with time, the code needed to implemented in a careful way. So, I decided 
to compare the recursive CTE function to a WHILE loop and a temporary table implementation:
DECLARE @c int
DECLARE @num_of_nodes int

SET @num_of_nodes = (SELECT  COUNT(*) FROM Dim_Business)

  skey int
 , ord nvarchar(1000)
 , lvl int

SELECT  Business_Skey
 , CONVERT(nvarchar(1000), Business_Skey)
 , 1
FROM Dim_Business
WHERE Business_Skey = -3

SET @c = 2

WHILE @c > 0
 SELECT  Business_Skey
 , CONVERT(nvarchar(1000), ord + ‘|’ + CONVERT(nvarchar, Business_Skey))
 , @c
 FROM Dim_Business db
 INNER JOIN #order o
 ON db.Parent_Business_Skey = o.skey
 AND o.lvl = @c – 1
 WHERE db.Business_Skey <> o.skey

 SET @c = @c + 1
 IF (SELECT COUNT(*) FROM #order) = @num_of_nodes
 SET @c = 0

SELECT  skey AS bid
 , ord AS bord
FROM #order

After comparing the results in Client Statistics and the IO reads, the 1st recursive query performs more than 20% worse than the WHILE loop query. It also trails the non-recursive query in the count of logical reads,read-ahead reads and scan counts.
It seems like in SQL Server 2005 calling WITH recursively does not work as good as coding set-based operations through WHILE loops.

Fun with T-SQL

As a tribute to some fairly restrictive .NET design I once fell victim to, I wrote this piece of code:

Use [Database]
Select [Select]
From [From]
Where [Where] = ‘Where’
And [nvarchar(50)] = ‘nvarchar(50)’
And [int] = 1
The CREATE and INSERT statements for the above command:
CREATE TABLE [dbo].[From](
 [Id] [int] NOT NULL,
 [Where] [nvarchar](50) NOT NULL,
 [On] [nvarchar](50) NOT NULL,
 [Select] [nvarchar](50) NOT NULL,
 [From] [nvarchar](50) NOT NULL,
 [int] [int] NOT NULL,
 [nvarchar(50)] [nvarchar](50) NOT NULL

INSERT INTO [From] (  [Id]
, [Where]
, [On]
, [Select]
, [From]
, [int]
, [nvarchar(50)])
SELECT 1, ‘Where’, ‘On’, ‘:-)’, ‘From’, 1, ‘nvarchar(50)’



EXECUTE AS someone else – impersonation in stored procedures

Recently, I had to implement a stored procedure, which is executed under a service account context. It had to be invoked by an Informatica workflow and its purpose was to clean a cache database table and to update all statistics on a database through invoking the sp_updatestats. As the Informatica user did not have enough permissions to perform these tasks, I had to use the EXECUTE AS in the stored procedure. The EXECUTE AS statement can also effectively be used to limit the execution scope through allowing limited permissions to the user account executing the stored procedure:
CREATE PROCEDURE [dbo].[usp_Update_Statistics] 
 clear cache
 EXEC sp_UpdateStats
Immediately I encountered various problems with the execution of the stored procedure. After some research it turned out that a few preconditions need to be satisfied before we can execute it:
1. TRUSTWORTHY setting on the database needs to be turned on:
2. The owner of both the context database and the master database must be the same. To set this property we need to:
   2.1 Find who is the owner of master:
           exec sp_helpdb
   2.2 Set the context database owner to the same owner:
           ALTER AUTHORIZATION ON DATABASE::database TO user
3. The account under which we EXECUTE AS needs to be a database/server principal. The database and server principals can be found in the sys.database_principals and the sys.server_principals tables.
4. The account needs to be granted impersonate permissions:
     GRANT IMPERSONATE ON USER::user in EXECUTE AS TO user executing the sproc
5. If a server login is specified (instead of a database user), it needs to be mapped to a database user.
Also, if SQL Server is running under a local account/service it is not possible to use the EXECUTE AS statement.
The same statement can be used to execute SQL Server Agent jobs under different context – something particularly useful when trying to run them through Integration Services.