On the search for the perfect OLAP browser

Browsing Analysis Services cubes is typically done by power users in SQL Server Management Studio or Excel. However, because of a requirement stating that intranet and non-intranet users need to be able to access and browse an OLAP cube I had to go out there and try to find the best OLAP browser on the market. The client had been using Dundas OLAP Services with mixed success, and had purchased PerformancePoint with ProClarity but never used it because of infrastructure issues. That is why they used Dundas as an interim solution. Dundas’s OLAP Services is a user-friendly tool, but tis performance is not ideal when it comes to large grids, so my client wanted something combining Dundas’s usability with Excel’s performance, which could be used both internally and externally.

I contacted Microsoft with a question: “What tools can I use?” They came back to me with the following suggestions:

  • IntelliMax Solutions OLAP Browser (an Australian partner)
  • BI Companion
  • Panorama
  • Dundas
  • ProClarity

I added PerformancePoint to the list, as well as Radar-Soft’s OLAP Browser, so my final list was:

  1. IntelliMax
  2. BI Companion
  3. Panorama
  4. Dundas
  5. ProClarity
  6. PerformancePoint
  7. Radar-Soft

IntelliMax Solutions

A sales guy told me that they will organise an online demo for me, and then if I am interested, they will organise an evaluation installation of their product on our servers. That just happened half an hour ago and it immediately became apparent that the product lack one critical piece of functionality – it does not support attribute hierarchies. So, I cannot use anything but user-defined custom hierarchies, because according to the support person, including attribute hierarchies make the grids and reports “too big”. I, however, definitely need these big grids/reports. Otherwise, the tool is nice and simple with very obvious support for Custom Data – you can define this property from the UI, which makes it easier to work on non-Kerberos enabled environments. It also should integrate with PerformancePoint and Reporting Services, but I did not test those parts, because of the aforementioned problem with attribute hierarchies.

BI Companion

They were very friendly and responsive. I got an evaluation install, tried it out and it worked fine, apart from a small bug with non-aggregatable dimension hierarchies, which they fixed immediately. I was quite impressed with the product. It seemed to perform slightly better than Dundas, but it also was slightly less user-friendly. The interface imitates Cube Browser and is very feature-rich. Unfortunately, the HTML version did not work with Firefox, so they suggested using a SilverLight version, which was not supported by my client. As one of my requirements was cross-browser operability (IE and Firefox at least), BI Companion became a no-go-to for me.

Panorama

Panorama’s NovaView product competed with PerformancePoint as a dashboard creation tool and I have no idea why Microsoft recommended a competitor. I contacted their reseller in Australia and the sales person organised an online demo. I could play with their grid analytics components and it seemed OK. NovaView comes in two versions – a legacy Java version, as well as a new Flash version still in development. The Flash version is quite nice, but still harder to use than BI Companion or Dundas. As a big miss in the current version, Panorama NovaView does not let the user to add/remove dimension hierarchies from its grid columns. It can be done on rows, but not on columns, which is frustrating and ultimately a critical lack of functionality in my case. The Panorama support person told me that “they did not want to release all features yet” and that they have it going in their lab. He also advised me to use the Java version until the Flash version gets updated. I did and found out that the Java version is indeed fast, but not as fast as Excel and ProClarity. Furthermore, Panorama’s product is just too big for our needs and all of its administrative functionality, which includes user management, security and what-not, is just not necessary. Otherwise it looks nice.

Dundas

The Dundas OLAP Services OLAP grid is very usable – all the users need to do is drag and drop dimension hierarchies to browse the cubes they have access to. Unfortunately, it is a bit buggy and very slow when the cells, which need to be displayed, hit the > 10 000 range. After performance testing, we found out that about 50-80% of the processing time takes place on the client machines, where the browser renders a huge HTML table. It turns out that it is not only Dundas which has the same issues. Any HTML OLAP tool I tested suffers from exactly the same problem. This includes Dundas, BI Companion and Radar-Soft. Nothing we can do. Dundas supports paging, so the results can be split in a number of pages. It performs better if that feature is turned on, but requires hacking when exporting to Excel, because in order to export the whole grid, by default the user needs to export all the pages individually and then manually concatenate them in Excel. Since the HTML table problem cannot be rectified by Dundas or us, Dundas’s OLAP Services remain what they were intended to be – an interim solution.

ProClarity

Dated, but FAST. ProClarity still beats all other solutions with its performance. It was as fast as Excel and much, much faster than anything else. Although it lacks the apparent user-friendliness of some of the other solutions, it does allow users to conveniently browse OLAP cubes. Furthermore, it integrates well with PerformancePoint. Unfortunately, 6.2 is its last ever version. Whether its functionality gets implemented as a part of SharePoint 2010 is still to be announced by Microsoft. By the current way things look, ProClarity is still the best solution for ad-hoc OLAP browsing. It is stable and even though its installation is not quite smooth (e.g. requires IIS to be running gin 32bit mode), it does offer what most users would feel comfortable with. Its drawbacks are: tabbed interface, no drag-drop, and outdated graphics. Not perfect, but good enough. Oh, and it works with Firefox.

PerformancePoint

The Microsoft dashboarding tool offers an Analytics Grid part, which can be used for ad-hoc analysis. Unfortunately, it does not allow the users to add more dimension hierarchies on rows or columns. What is does well is drilling up or down the hierarchies and slicing by other hierarchies. Unfortunately this is hardly enough when it comes to full-featured ad-hoc reporting.

Radar-Soft

The small OLAP browser Radar-Soft is offering seems fast at a first glance. It would have been a nice small tool if it did not have one major problem – paging. It pages everything. It even creates a number of pages within cells. This improves performance but makes it impossible to use if we want to export a whole grid to Excel. I guess that some of the paging can be disabled, but I am quite sure that, since it is HTML, it will have the aforementioned issues with performance when it comes to displaying a relatively large HTML table.

Conclusion

The three HTML solutions – Radar-Soft, Dundas and BI Companion all had the same issue with performance. My guess is that there is no HTML solution which is faster, because in the end it comes to browser performance rather than MDX or .NET implementation.

Panorama with its Java and Flash versions is maybe a good solution for a full-featured dashboarding and reporting, but is a massive overkill for our circumstances. Also, it is not very user-friendly and not very fast.

PerformancePoint and ProClarity are my favourite and I believe that Microsoft’s offering provides the best capabilities for our users at least until something better comes from the Microsoft workshop. These integrate nicely with SharePoint and perform very well. While PerformancePoint is very user friendly, I would not say that for ProClarity. Hopefully in the future we will see nicer OLAP browsing by Microsoft based on these two tools.

A word about SilverLight and Excel

While SilverLight OLAP browsers are available by various software houses (most notable BI Companion has one in Beta), it is a relatively new technology and was not an option for me as it was not in my client’s SOE. Also, from my testing it does not seem like it is much faster than HTML, but definitely looks nicer.

Excel is, in my opinion, the best tool for browsing OLAP cubes inside the organisation. It is fast, and provides a rich set of analytics capabilities. There are a lot of discussions about Excel as a report authoring tool on various blogs, but I believe that for ad-hoc reporting there is nothing out there which beats Excel. Despite this, I could not recommend it to my client, as the solution is exposed to external users and letting them connect to our cubes directly though Excel is just impossible considering our security infrastructure. Also, Excel Services does not provide enough ad-hoc functionality, as it does not allow users to modify the Pivot Tables.

Wishes

I wish and hope that Microsoft releases a full-featured, user-friendly and performant OLAP browser as part of Visual Studio and .NET or as a part of SharePoint 2010. Such a tool is just a must and a serious miss in the Microsoft BI stack. The combination of Excel+PerformancePoint+ProClarity does the trick for now, but as ProClarity is getting discontinued, there must be something in the future toolset, which takes its place.

A few other bloggers wrote posts closely related to this topic:

Richard Lees – Which cube browser for Microsoft OLAP
Richard Lees – What’s preventing Excel from being the ubiquitous cube browser
Chris Webb – Proclarity Migration Roadmap (or lack thereof)

Inferred Members Implementation Best Practise

Just last week a colleague of mine and I did a very thorough research on the best implementation practises in regards to Inferred Members. We went through a few web sites and we also had a look at previous implementations through SSIS. There are quite a few resources on the subject and the best two methods we found were based on Project REAL and a SQL CAT recommendation. So, in the end we agreed that we can recommend three approaches, which have their advantages and disadvantages and could be all successfully used when we need to have Inferred Member (late arriving dimensions = early arriving facts) support in a solution.

Our findings can be summarised in the following table:

Method Pros Cons
Stored procedures No need to use SSIS SlowComplicated developmentNeed to link source and target data server instances.
Script Component (SSIS) FastHigh reusability .NET skills required
Double Lookup (SSIS) FastHigh reusability Performance issues with multiple dimensions

I will go through each of these methods and provide some more details about them.

1. Stored Procedures

If SSIS data flow tasks are not an option for our ETL we can implement inferred member support through SQL code. The basic idea behind this method is:

  1. Find new dimension members in the fact data and add them to the dimension
  2. Move the fact data from source to target (i.e. staging to datamart) with surrogate keys from the dimension table

If we have our source and target data on separate instances of SQL Server it becomes quite inconvenient to utilise this approach. We need to link the two instances to use tables from both tables in the same stored procedure. This is a major issue and it is easily avoided by using SSIS.

2. SQL Server Integration Services

A general design of SSIS solutions can be represented with the following diagram:

Inferred Members Small

We can implement this in the following ways:

2.1. Script Component (Project REAL)

The Data Flow task for this approach is:

image

After looking up missing states, we pass them to a Script Component, which hashes the misses and hits the database only when a genuinely new miss occurs. We have a stored procedure in place which simply adds the new dim member in the dimension table in the database and returns the new surrogate key, which then gets sent to the fact table.

Using .NET we can efficiently hash all values in a object collection and we can also handle both character and numerical data.

Because we are using a fully cached lookup, this data flow item is case-sensitive, therefore we should make sure we equalise the case in both fact and reference table data before we compare the two. We should also make sure that in this case the Script Component is case-insensitive, because if it is we will end up with multiple different rows in our dimension table for each case variation of our inferred members.

Additionally, the Script Component task should be built either accepting a parameter for the dimension name, or it can read its name (in our implementation) and find the first word in it to determine the dimension it is used for. In the above diagram, the Script Component task is handling the State dimension, therefore its name starts with State – . This makes the implementation of multiple Script Components for multiple dimensions very easy – all we need to do is change its name and it just works. There can also be some .NET code for auditing purposes. This is also fairly easy to implement and an entry level of .NET should be sufficient for development and maintenance. A sample of this approach can be found in Project REAL and it is thoroughly discussed in the SSIS paper produced with it.

2.2. Double Lookup (SQL CAT)

image

Here, we are doing the same as before – matching fact rows against a State dimension. If a mismatch is found in the first (fully cached) lookup, we pass the row to the second one. The New StateSK is a partially cached lookup. Its purpose is similar to a hash table – it caches new rows, and when there is a brand new mismatch it adds it to the database by executing a stored procedure. Then we Union All our inferred members with the rest of the fact data.

Because the second lookup task is utilising partial lookup, it is case-insensitive and case variations of the same character term will not lead to multiple dimension rows for the same member. However the first fully-cached lookup is case-sensitive, so we should make sure that both source and reference data is in the same case because that would be more efficient. Furthermore, In SQL Server 2005 partial caching must have a specified memory limit and if we have a large number of late arriving dimension members, we may run out of memory for them. In that case SQL Server 2005 will start discarding the least used values from its cache, which may have performance implications. The latter problem is overcome in SQL Server 2008.

Another problem with this approach is the use of Union All tasks. These are semi-blocking and may impact performance when used multiple times in our ETL package.

For much more detailed description, including samples you can go to:

SQL CAT – Assigning surrogate keys to early arriving facts using Integration Services

From our testing and research we reached the conclusion that using Script Component is the best approach, closely followed by Double Lookups. The stored procedure approach is slow, hard to maintain and may be impossible to implement in a production environment. Using SSIS with .NET proves to be efficient, convenient and fast.If avoiding .NET is preferable, handling inferred members in SSIS is a very good alternative.

Custom Rounding and Truncation of Numbers in MDX

Article published in SQL Server Central on 2009/03/26

In some scenarios we need to be able to round or truncate decimals to achieve correct calculation results. In SQL we have ROUND, which can do either of these. It rounds like we are used to – 0.5 rounds up to 1, can round up or down and we rarely get a project where as a part of the requirements we are implementing our own rounding or truncation algorithm.

However, in MDX we have Round() which performs a “strange” operation – bankers’ rounding, which our business users have usually not been exposed to, and if we decide to truncate to an integer number through casting with Int or cInt, we also get some strange results. To illustrate the problem with MDX please consider the value of these expressions:

Round(2.15, 1) = 2.2
Round(2.25, 1) = 2.2
Round(2.35, 1) = 2.4
Round(2.45, 1) = 2.4

Int(1.9) = 1
Int(-1.9) = -2
cInt(1.9) = 1
cInt(-1.9)= -2

These are usually considered wrong, because they are not the obvious results. Even though they are mathematically well founded, if we round 2.25 to 2.2, our users will come back at us with wrong numbers on their reports. Same goes for “trimming” -1.9 to -2.

To resolve the first problem with rounding, we can use our own math formula:

Fix([Measures].[???] * Factor + 0.5 * Sgn([Measures].[???])) / Factor

Where Factor is the rounding factor – 1 for 0 decimal places, 10 for 1 and so on (defined by 1/Factor). Of course, Factor of 0 will give us Div by 0 error. (Reference: http://support.microsoft.com/kb/196652)

If we have the Excel function libraries intalled on our server, we can also simply use Excel!Round() as Chris Webb advises.

As for the Int and cInt in MDX, we can use the Fix() VBA function to remove decimal places:

Fix(1.9) = 1
Fix(-1.9) = -1

Also, for truncation of values to a certain decimal point in MDX, we can use the following formula:

Fix(<value>*10^1)/10^1

All we need to adjust in order to change the decimal places is to replace 10^1 with another power of 10 – in example, to truncate 3.156 to 3.15 we can use: Fix(3.156*10^2)/10^2. To make things simpler, in all our formulas the power of 10 is what determines how many decimal paces we need to round to; negative powers will give us rounding to tens, thousands and so on. If we use ^0 we will round to whole numbers.

Using these we can avoid bankers’ rounding and some strange results with converting to integers in MDX.

 

Developing reports for colour-blind people

According to Wikipedia 8% of all males and 0.4% of all females in Australia are colour-blind to some degree. The percentages are slightly different in the USA – 7% for males and 0.4% for females. It is estimated that these would be similar to other countries in the world, which means that a very high percentage of people may have difficulties distinguishing colours. Therefore, a significantly large part of our potential report users may not be able to interpret some of our graphs and charts correctly (There would be around 400 colour-blind males and 20 colour-blind females in an organisation which employs 10000 people).

In example, the following chart is quite nice and simple but can be useless to colour-blind people:

chart-colour

versus:

chart-grey

Similarly, this table is somewhat confusing if we remove the colours:

table-colour

versus:

table-grey

We have to be very careful with design of KPIs and dashboards in general:

kpi-colour

versus:

kpi-grey

As we can clearly see from the above examples, not being able to clearly distinguish colours makes our poorly designed charts and tables confusing.

We should always keep in mind the following considerations when designing and implementing our most-common report items to ensure that they can be used by everyone in our client organisation:

  • KPI indicators must be different in shape rather than just colour
  • Line charts should utilise markers with different shapes
  • Bar graphs should include a marker on the top of each bar
  • Avoid colour-coded pie-charts – they can be extremely difficult to read for a person with even the slightest colour-blindness condition
  • Avoid colour-coding tables – either backgrounds or text colours are usually unacceptable

Other more general suggestions:

  • Shapes are much more important than colours
  • Greyscale and shades of the same colour are acceptable, as a colour-blind person can distinguish between those

Of course, even after all our efforts to create reports readable by everyone, we may miss some detail. The best way to ensure that we have done a good job is to test. There are two good and easy ways to do that:

  1. Printing – print the report in black and white and see if all information is well presented
  2. Changing display to greyscale – Windows lets us choose how many colours we want to display on our screen. Choosing greyscale and then playing with our report is possibly the best way to ensure that colour-blind people can use our reports.

It is fairly simple and easy to always apply these design principles when creating reports. I have found that most organisations are quite happy to include minor tweaks to their dashboards and reports when they understand how important they could be for some of their employees. Furthermore, it helps to promote accessibility to technology regardless of minor disabilities and gender.

Reporting Services Styles in a Database Table

Article published in SQL Server Central on 2009/03/26

 

As it was previously discussed by Adam Aspin in his series Add Styles to Your Reporting Services Reports on SQLServerCentral, it is very useful to be able to change various report attributes, such as colours and borders properties, in a centralised way just like by using CSS files or Skins. Currently, Reporting Services does not allow us to use a similar approach, so Adam showed us how to apply a “style” to a report by using custom code in Reporting Services – either by embedding the code for all attributes directly in the report, or by referencing an external source. However, there is another way to handle such attributes and that is by storing them in a database table. There are certain advantages in doing so, but there could be some disadvantages, as well.

 

I will proceed to show how to build a report utilising this approach and then I will discuss its strengths and weaknesses.

 

Database Table

 

I would suggest to keep your database tables storing values such as report attributes, Reporting Services environment settings (for using dynamically generated javascript) and other dynamic report properties separated by naming them in a unified way. I have used the CTL (for control) prefix in the past. For illustrative purposes I will provide a simple definition of a database table called CTLReportColours:

 

CTLReportColours (
HeaderColour nvarchar(10),
FooterColour nvarchar(10),
BodyTextColour nvarchar(10)
)

 

As actual values we can store either the Reporting Services colour name (WhiteSmoke, SteelBlue, etc), or the colour codes (#00AA00, #CCBB66, etc). I am mentioning this because I have had issues with printers in the past, which recognise Red as Pink and changing to hex colour codes fixed the problem. This should be the topic for another article, so I will drop the subject here.

 

Stored Procedure

 

We can either retrieve the values from the database table in a stored procedure we are already using on our report, or we can create a new one, specifically built for getting report attributes. We can use the first approach if our report uses one data set only. Then we can assign all report items to that data set and simply use the Fields collection items in our expressions. However, if we have multiple data sets assigned to different report items, such as tables(ixes), we may find that we need to code the retrieval of our report attributes in many many stored procedures. To overcome this limitation, we can create a separate stored procedure, which returns the set of values stored in the database table. A simple example is code like:

 

CREATE usp_CTL_Get_Report_Colours
AS
BEGIN
SELECT HeaderColour
, FooterColour
, BodyTextColour
FROM CTLReportColours
END

 

Data Set

 

In Reporting Services we can use usp_CTL_Get_Report_Colours to construct a data set, which will have three fields named like the column names.

 

Usage

 

To use a separate data set means that we need to be able to use its fields in some tables which already have a data set assigned to them. Since we can have only one data set per report element, we need to create separate parameters for each of the fileds in the attributes data set we have created. In our case, we need to create three parameters: HeaderColour, FooterColour and BodyTextColour. All of these should be hidden and should be populated by our data set in the outermost report only – if we have subreports, they should get their values from the parent report, so we avoid multiple execution of the usp_CTL_Get_Report_Colours stored procedure.

 

Conclusion

 

The described method for handling report attributes can be preferable because:

  • We do not need to know anything about coding to create dynamic reports
  • Reports are dynamic with no need to re-deploy them
  • Report attributes are in a database table, which makes their management easy
  • Various management screens can be built on top of the database table in .NET with relative ease
  • Retrieving the values through a stored procedures allows us to pass back parameters, which then can drive what colours get displayed – making the retrieval flexible if we need to display different colours for different departments or people
  • Centralised and secured storage to all report attributes – environment settings, styles, etc.

 

The drawback is one as I see it: having to create multiple report parameters in all our reports. In my opinion it is a much smaller issue than having to manually change report properties with every change in the requirements and overall it is well worth the effort.

Enhanced Security and Integration of Microsoft BI Solutions with Kerberos

The following is an overview of an article prepared by Mark Dasco and myself. The full article is around 4800 words and is attached to the end of this post. Basically, the article is a quite thorough description of how to implement Kerberos and the benefits Kerberos provides when implementing Microsoft BI solutions.

Table of Contents

 1 Overview
2 The Double Hop
3 The NTLM Protocol
4 The Kerberos Protocol
5 Business Intelligence Case
6 Implementation
6.1 Considerations
6.2 Implementation
6.2.1 Getting Started
6.2.2 Configure Clients for Kerberos Authentication
6.2.3 Defining SPNs
6.2.4 Using Negotiation
6.2.5 Enable Impersonation and Delegation
6.3 Checking that it all works
7 Conclusion

 

Overview

When developing Microsoft Business Intelligence solutions we frequently need to rely on tight security integration between various tools. The NTLM protocol provides enough features for simple implementations, but when we need to provide enterprise-class solutions we invariably feel constrained by it. With Windows 2000 and later versions Microsoft provides an alternative security protocol – Kerberos, which addresses certain limitations of NTLM, provides improved security and better performance. The implementation of Kerberos could be fairly simple or very complex depending on the requirements. Configuring a few default server instances with no constraints on their services secured through Kerberos could be almost trivial, while for some more specific cases it could be a major cause of frustration.

As a base for this study we will examine a specific BI case – a digital dashboard, which involves all layers of the Microsoft BI stack:

 

  •  SQL Server 2005
  • SQL Server Analysis Services
  • SQL Server Reporting Services
  • PerformancePoint 2007 Monitoring and Analytics for building a dashboard
  • SharePoint Server as a organisational portal hosting the dashboard

Furthermore, each of the servers exists on two environments – Development and UAT. Also, we will show how we can implement Kerberos only between the services utilised by the servers, not affecting the rest of the domain and effectively isolating the implementation.

Typically, solutions not configured for Kerberos authentication and delegation fall back to the default NTLM authentication protocol. Whilst NTLM is completely transparent and very easy to use on a Windows domain, it falls short when we need to pass user credentials across a few server layers. This is commonly known as a double hop issue. If we depend solely on NTLM for user authentication, passing user names to servers on lower levels of our server topology involves including them in our connection strings or passing them programmatically, which is hardly the right choice for an enterprise-grade security framework.

On the other hand solutions which correctly implement Kerberos are advantaged with cross-server delegation and authentication, thus allowing the use of Integrated Windows Authentication throughout the whole solution. The ability to capture user credentials on any server is essential if we want to be able to secure and control access to each server independently and minimise the damage resulting from a potential security breach.

Download Full Article by Mark Dasco and Boyan Penev

Combining Slowly Changing Dimensions and Current Dimension Versions

When we need to see historical changes of a dimension in our OLAP cube the common practice is to implement it as a SCD – or a Slowly Changing Dimension. There are a few ways to do this and a really good definition of the different types of SCDs can be found in Wikipedia: Slowly Changing Dimension. Also, there are quite a few articles on Implementing SCD ETLs in SSIS, two of which are:
  • SCD Wizard Demo – SSIS Junkie blog example of a package using the Slowly Changing Dimension transformation in SSIS
  • MSDN Article on the Slowly Changing Dimension transformation in SSIS

Since SQL Server Integration Services 2005 and 2008 include a SCD transformation it is not too hard to implement such dimensions.

Here I am discussing a typical requirement – to be able to have a SCD and a Current version of the dimension.

First, it is important to notice that a SCD should have two dimension keys: a unique surrogate key identifying every version of the dimension members and a non-unique code, which is common for all versions for a dimension member. This is also very important if we want to be able to determine the current version of a dimension member. An example of a very simple dimension table utilising this design is:

Here we have two distinct dimension members with Code of 1 and 2. Member1 has two versions and Member2 has three. The SKeys (surrogate keys) for these versions are unique but the codes stay the same for each member. Also, notice the From and To dates which allow us to distinguish the periods for the member versions. We can have an IsActive or IsCurrent bit column, which shows us the latest version of a node, but we can also just filter on dates which are 9999-12-31, which will give us the same result.

Assuming the described design I will move on to discuss the ways to build a dimension in SSAS.

First, the standard way to link the dimension table to our fact table is through the surrogate key. We can have a regular relationship between the two tables. As the fact data is usually also linked to a Time dimension, fact records linked against the periods between the From and To dates of our SCD will be linked to that versions SKey. An example of a fact table with a few rows, which can be linked to the dimension table above is:

The row with a FactKey of 1 will be linked against Member1Ver1, while FactKey 2 will go against Member1Ver2. Therefore, when we slice our cube by Time and our dimension we will see:

This is the standard way to implement our SCD and these are the results we would expect. Now, we get a new requirement. We want to be able to see both this and an aggregation against the current version of our dimension. We have a few ways to implement it. One obvious way is to create another dimension containing only the current dimension members. This can be easily achieved if we add a Named Query in our DSV, which shows only the current dimension members:

SELECT SKey
, Code
, Description
FROM DimTable
WHERE ToDate = ‘9999-12-31’

The result will be:

Then we need to replace our fact table with a Named Query, which shows the DimSKeys for current version dimension members:

SELECT ft.FactSkey
, dt_current.DimSKey
, ft.TimeKey
, ft.Amount
FROM FactTable ft
INNER JOIN DimTable dt
ON ft.DimSKey = dt.SKey
INNER JOIN DimTable dt_current
ON dt.Code = dt_current.Code
WHERE dt_current.ToDate = ‘9999-12-31’

This will give us the following result:

When we slice our cube, all records for Member1 will be against the latest version:

Implementing this, we can have two dimensions in our cube, so our users can use the one that makes more sense for their needs:

  • Dimension and
  • Dimension (Historical), and the Historical designation stands for, in technical terms, a SCD

However, we can also implement this in a different way, which allows us to avoid building such logic in a view or our DSV. The trade-off is some space on our disks and one more column in our fact table. Instead of adding a new column through writing SQL, we can simply add the dimension Code in the fact table. Then, we can build our dimension again by getting the latest versions, but instead of having the SKey as a dimension key, we can use the Code. It is of course unique across all dimension members, as long as we filter our the non-current versions. The query for doing this is exactly the same as the one we used before. However, we need to change our fact table design and add a DimCode column:

Then, we create two dimensions again, but we link the Historical dimension with the DimSKey column and the Current one with the DimCode column. The result of slicing the cube by the current version is exactly the same as before. The trade-off is space vs. processing time and CPU usage. It is up to the developer to choose the more appropriate way to build the solution.

So far I discussed two ways of having our SCD and Current Version dimension in different dimensions in our cubes. There is, however a way to combine both in the same dimension. To do this, we need to have two levels in the dimension: a parent level, which contains the current version of the dimension members, and a child level, which contains the historical versions. In example:

Member1Ver2
Member1Ver1
Member1Ver2
Member2Ver3
Member2Ver1
Member2Ver2
Member2Ver3

This way the historical versions aggregate up to the current version and we can use either level, depending on what we want to achieve. To build this, we can use our current dimension table and add a parent level through SQL. This way, we do not need to update all records when a new version comes:

SELECT dt.SKey
, dt.Code
, dt.Description
, dt_p.SKey AS ParentSKey
FROM DimTable dt
INNER JOIN DimTable dt_p
ON dt.Code = dt_p.Code
WHERE dt_p.ToDate = ‘9999-12-31’

The result is:

Then, we can build our Parent-Child dimension and we can use the Parent level is we want to have current versions and the Child level for the historical ones.

This approach allows us to combine the two dimensions into one. It is also possible to implement it in a non-parent child fashion because the hierarchy is not ragged.

It is always advisable to make sure we actually need a SCD and avoid it whenever possible because it is not always intuitive for users to use one. Splitting our fact data on multiple rows can be surprising for users and understanding how the historical dimension works and the multiple nodes it consists of can be a problem. However, it lets us satisfy a common requirement and therefore it is quite important to know how to build.