Boyan Penev on Microsoft BI

Source Control for Database Development with SSDT

I had never had the change of using source control for database development before I started working on my current project. I have used TFS and Subversion for SSAS, SSRS and SSIS projects in the past. Since it is very difficult to edit manually the rdl, dtsx and SSAS-related files, TFS/Subversion took the role of a glorified storage location with some version control capabilities.

With SSDT we can now enjoy the full set of capabilities of a source control system. This is by far the most compelling reason for me to use SSDT. Sure, there are other very very nice features which make SSDT way better IDE than the plain old Sql Server Management Studio so many SQL Server developers use: top 3 would be dependency/reference checking, better IntelliSense and schema compare (which I used to do with RedGate’s SQL Schema Compare in the past). But, source control is not just about convenience – it makes you and your team more productive and reduces stress levels (yeah, it does!). T-SQL is much like application code – it’s easy to modify by a developer, it’s readable and merge-able. So, it is a prime candidate for being managed using a source control system.

SSDT is free. There are absolutely no drawbacks to using it in a team db development environment and I would strongly encourage anyone who has not tried it to do so. The initial investment in learning how to use it will pay off handsomely in the longer term. Things like _old suffixes to tables and stored procedures become unnecessary, we don’t have to worry about deleting a piece of code and then losing it forever, daily backups for the purpose of not losing code, and in fact – we stop worrying about losing code in general. It will be there forever and we can track its progress (we can also see who breaks it). I’m sure most db developers would be well aware of the benefits of source control, but looking back, my projects always went ahead without using SSDT because the perceived complexity of using SSDT outweighed the potential benefits.

It is not hard. All you do is create a db project and start adding code. That’s in the form of .sql CREATE scripts. And that’s about it. Your project is configured to deploy to a SQL Server instance. When you build the solution, the output is a dacpac. When you deploy it to an instance the dacpac gets compared to what is already there and an incremental update is applied to the target environment. If you add a non-nullable column to a table which already has some data in it you will get a deployment failure. But try doing it on a live db – you’ll get the same issue, right? And since you are in SSDT if you change the name of a table you’ll get errors on all objects referencing it (like stored procedures) and your solution won’t build. Therefore, using SSDT prevents you from breaking your solution’s consistency. You can also refactor code – a right click on a function/stored proc reference allows you to change its name across the whole solution. All this plus source control. Still not convinced? Well, you may need to go through a few more db development projects (especially in a team environment), get pissed off with your teammates, lose some code, spend a few hours chasing code references to realise how important and empowering SSDT can be.

If your whole team is not on the same page, you can use SSDT solo. Import the db in SSDT and off you go – now you can tell everyone how they break your project all the time with changes which are not properly propagated across the solution. You can also import all new changes applied to the live db through the handy schema compare (SQL toolbar item). Just compare the live db to your project and it will show you all new objects, as well as all changes to existing objects in your solution. Applying these will effectively synchronise your solution with the new stuff other developers apply to the live db.

But don’t just take my word for it – download SSDT for free from: http://msdn.microsoft.com/en-gb/data/hh297027 and give it a good go before you decide that you are too old for it!

Oh, and if you do think it’s too much to handle, or if you would like to get your SSDT techniques honed a little and you live in London I’d recommend either coming to work on my current project, or letting you boss know of how great SSDT is, and how he/she can empower your team and splash a little on:

http://www.technitrain.com/coursedetail.php?c=28&trackingcode=CWB

Running PowerShell Scripts in SQL Agent Using CmdExec

If you have tried to execute PowerShell scripts through SQL Server Agent you may have noticed that the “version” (technically it’s not a version – see the links below for more info) of PowerShell run by the agent is not the same as the command-line PowerShell you may have tested your scripts with. The differences are explained in quite some depth by Chad Miller in his “The Truth about SQLPS and PowerShell V2” post and in his shorter answer to a question on StackOverflow. In brief, in all versions of SQL Server (including 2012), SQL Server Agent runs PowerShell scripts through a sqlps.exe process instead of the powershell.exe we get on the command prompt. Most annoyingly there are core syntax differences which do cause issues when running scripts with SQL Agent over sqpls. For example, if you have a construct like $(), or a Get-Date call, SQL Server Agent will not recognise the syntax and you’ll get a syntax error. It looks like SQL Agent’s Powershell task doesn’t like statements which can be interpreted as sqlcmd ones. Luckily, there is a very simple workaround if we want to circumvent this behavior. All we have to do is save the script somewhere on the filesystem (which SQL Server Agent’s service account can access, of course) and execute it through an Operating System (CmdExec), instead of a PowerShell step:

There we can simply call our script by:

powershell <script>

For example:

powershell “C:test.ps1”

This will execute the test.ps1 script. All we have to do is make sure that the SQL Server Agent account can see the script (permissions) and also that it can read/write to the locations touched by the PowerShell script. This way if you test scripts through the powershell.exe PowerShell environment you will get the same results in SQL Server Agent. And if you want to have the SQL Server cmdlets available, you can load them as a module as described here.

Cleaning up Hanging Traces with PowerShell

This post is a brief follow-up to my previous post Counting Number of Queries Executed in SSAS. There we saw how we can create and save server-side traces in PowerShell. If you have played with those scripts chances are you have a few hanging traces on your servers. Other possible sources of hanging traces could be Profiler crashes (not on close as it usually cleans after itself), or other applications which do not drop traces they create. To check what’s running on your servers have a look at the XMLA scripts in this blog post by Karen Gulati. It could be nice to have a PowerShell script which drops all traces other than the ones you need. The following script does just that. As it loops over a number of servers you don’t need to install it on multiple machines (provided you have a suitable account). Note that if you are an ASTrace user you may not want to use this as the traces ASTrace creates do not have distinctive names which you can add to the $ExcludeTraces array, so the script would just clean them up as well.

[Reflection.Assembly]::LoadWithPartialName(
    "Microsoft.AnalysisServices") > $null
$SSASServers = @("myserver1", "myserver2")
$ExcludeTraces = @("FlightRecorder", "Important Trace")

foreach($srv in $SSASServers) {
    $conn = 
        New-Object Microsoft.AnalysisServices.Server
    $conn.Connect($srv)

    $traces = @($conn.Traces | 
        where {!($ExcludeTraces -contains $_.Name)})
    $cnt = $traces.Count-1

    while($cnt -ge 0) {
        $traces[$cnt].Drop()
        $cnt--
    }

    $conn.Disconnect()
}

Thanks to Darren Gosbell for his assistance with all things PowerShell 🙂

Is Excel 2013 Power View hurting SSAS?

Since the release of Office 2013 Preview, there has been an awful lot of commentary both in the BI blogosphere and in all media mostly praising the new look and features Microsoft are giving us. I agree that the new suite is getting a much needed face-lift (after all 2007 -> 2010 did not deliver much in this regard), and the mandatory for post-2010 releases “touch-enabled” interface. As a BI guy, the most interesting thing for me in the new Office is Excel, and within Excel, Power View (not completely ignoring the new Pivot Table options like adding new calcs natively, without using the OLAP PivotTable Extensions plug-in).

There are deficiencies in Power View as it stands now (maybe that is why it is an add-in, not enabled by default) but it is a nice data exploration tool which even Stephen Few told me was going in the right direction. It still doesn’t allow us to do things like Top 10, % of Total, etc. which I love in Excel Pivot Tables, but I am hoping that we’ll get all these and more at some point. Together with maps, small multiples and highlighting, Power View does seem like a potential killer application, making other tools like Tableau seem within reach.

However (here we go), in my opinion it has one drawback which really spoils the good things: it doesn’t work on SSAS Multidimensional cubes. I am sure that if someone from the Microsoft SSAS or SSRS teams reads this they wouldn’t be surprised. After all they have heard this numerous times since the release of Power View (even when it was Crescent). Chris Webb wrote about it and caused some stir, it was discussed and the issue was parked as “will be fixed at some point”. I don’t want to resurrect it in its previous format since we now know that a fix is coming (as publicly stated by Microsoft at PASS and other forums), but I would like to point out that last week after demoing Excel 2013 in front of some decision-makers in the organisation I currently work at, I had to answer questions like “So, we can’t use Excel with SSAS at all?” and “Do we have to upgrade all our cubes to in-memory models now?”. This made me think: we can’t cover the whole feature set of SSAS MD in SSAS Tabular, yet we can’t use the new end-user tools on SSAS MD as well. Basically, I am left with one option only – to recommend SSAS MD and Excel Pivot Tables as the only strategy for corporate BI competitive with other vendors like IBM and SAP. Furthermore, since we can’t use DirectQuery on Teradata, I can’t also say that SSAS Tabular is better than SSAS MD as an analytical platform on top of Teradata. Yes, one day when we get it working it will trump all other options, and the Teradata guys I work with are genuinely excited about the future, but it is unfortunately the future, while other options are there now. And the future in IT always looks bright, while the present is a different story altogether.

So, is Excel 2013 Power View hurting SSAS MD? Maybe not directly, but the more we promote Power View (hint: as it’s getting deployed on every workstation we hardly need to promote it), the more we also highlight the deficiencies in SSAS Tabular and make SSAS MD look like the neglected sibling. As for my demonstrations of Microsoft BI, I would probably treat the two components of SSAS – MD and Tabular as two separate stacks. MD – the best out there for corporate BI, and Tabular – the best for team/personal BI, not to be mixed together. To those who can’t wait to get their hands on a powerful, modern analytical tool working on the iPad and use it on a powerful, robust, widely used analytical engine which works on less than terabytes of memory (let me spell it out: Power View +mobile on SSAS MD), we have to tell: “at some point in the future, hopefully”.

PS: Yes, it does sound like yet another whiny post, but we have been waiting for a very long time (in IT-terms) since the inception of Crescent to get this bloody issue fixed and it is hard to imagine mighty Microsoft struggling to catch up behind Cognos (yes, they have mobile – terrible implementation, but works), Tableau (last time I checked they were a tiny company with overpriced products) and QlikView (ugly and slow from what I’ve seen). It’s not like Microsoft weren’t aware of this, and it’s not like it’s happening for the first time – remember how well PerformancePoint did being unable to connect to anything but SSAS MD and having a lacking feature set, plus SSRS to SSAS integration has always been a pain point. The new, quicker, release cycle at Microsoft seems to be delivering the same amount of features in less-complete releases, which doesn’t seem to help with inspiring confidence in its vision and ability to execute. However great these new ingredients are, the dish doesn’t taste that well when some are lacking…tends to get cold quickly too.

Counting Number of Queries Executed in SSAS

When monitoring SSAS we can use a few tools to gain insight into what is happening on the instance. Perfmon is especially interesting as it shows us information about various aspects of the server – we can see memory, disk, CPU and MDX counters out of the box. However, perfmon cannot show us how many queries were run on SSAS – neither per period (e.g. second), nor in total since the service had started. We have another counter, similar to a count of queries – Storage Engine Query : Total Queries Answered. This is not the same as the number of MDX queries answered, as if a query does not hit the Storage Engine (SE), but the Formula Engine (FE) cache instead, it won’t increment the counter. And, if a query needs to be answered by more than one Storage Engine requests, we will see that the counter gets incremented with more than 1.

There are alternatives. We can try using the Query Log server settings and save queries to a database table automatically:

Unfortunately (for us in this case), since the Query Log is typically used for Usage Based Optimisations (UBO), it also shows SE requests, not the actual MDX queries issued. Hence, it suffers from the same drawback as the perfmon Total Queries Answered counter.

The other alternative which comes to mind is SQL Server Profiler. After all it allows us to see what queries get sent to the server, along with lots of other useful information. If we are interested in the count of queries only, we can capture a trace which includes only the Query End event. Why not Query Begin? Well, we can’t see some possibly interesting numbers before the query has been executed, for example the Duration of each query. If we capture only Query End we get exactly what we need – one event per query. We can also capture Error events to check if things go wrong.

A typical Query End-only trace looks like this (all columns included):

As we can see above, we get the actual query in the TextData column, the User Name, Application Name, Duration, Database Name – all very useful properties if we want to analyse the load and the performance of our server and various solutions. While the information looks great on screen, having it in a database table would enable us to do what we (BI devs) like doing – play with the data. Only this time it’s our data, about our toys, so it (at least for me) is twice as fun.

A SSAS trace can be run on the server without Profiler – the term for it is a “server-side trace”. Unlike SQL Server RDBMS traces, a SSAS server-side trace cannot directly log its output to a SQL Server table. Luckily, we can write the results in a “.trc” file, which in turn can be imported into a SQL Server table with a few lines of code.

To create a server-side trace we can first export the trace definition to an XMLA file from profiler:

The XMLA file we get looks like this (all scripts are attached to the end of this post):

If we remove the first line (?xml version…), we can run this and it will create a trace on the server. This trace is useless to us because it does not write its output anywhere. To fix this we can add a few settings like this:

Note that I have changed the Name and ID of the trace to be more user-friendly, and I have removed the Filers section, as well as the first line. Now the script can be used to create a trace. To delete the trace we can use:

And to list all running traces on a SSAS instance:

Note that by default SSAS is configured to run Flight Recorder, which uses a trace to capture events happening on the server (thanks to Dan English for suggesting this clarification). The trace may appear with ID and Name of FlightRecorder. You should not tamper with the trace manually. If you want to stop it, which is a performance recommendation from the SSAS Performance Guide, you should disable Flight Recorder from the SSAS server properties (click for a larger version):

Now, armed with all these scripts we can effectively manage traces on our instances. The only problem we have not resolved so far is exporting the trc file created by our create trace XMLA command to a database table. This can be easily done in a .NET app, or in PowerShell. The ps_trace_copy_to_table.ps1 PowerShell script does the following (thanks to Darren Gosbell for creating the initial version).

Copy the trc file with another name for further processing
Delete the trace form the server
Delete the trace file
Start the trace on the server again
Export the copied trc file into a SQL Server table
Run a stored procedure moving the data from the SQL Server table to another SQL Server table which contains all trace data

We delete->create the trace because otherwise we cannot delete the trace file. The last step runs a stored procedure which moves the data from the table we imported into to another table because the first (landing) table gets re-created every time we import into it and the trace data cannot persist in it.

Note that there is some exception handling implemented in this script (try/catch/finally blocks). If you are using an old version of PowerShell, it is possible that you may have to remove the exception handling statements as they were not supported. I have attached a sample script which excludes them (ps_trace_copy_to_table_no_exception_handling.ps1).

Finally, we can run this script through SQL Server Agent on regular intervals. SQL Server Agent allows direct PowerShell script execution, so this is very easy to set up and can be scheduled to run recurrently every X minutes/hours/days.

A small note: you may notice that some rows in the database table contain events with an Event Class <> 10 (10 is Query End) and NULLs in most columns. These are trace-related events and can be ignored (e.g. your stored procedure moving the data form one table to another can skip such rows).

There is also another way to achieve the same outcome. As a part of the Microsoft SQL Server samples you can download a utility called ASTrace. You will have to download and compile it, but after that you can run it as a Windows service and it will capture the trace events as they happen and write them directly to a SQL table. The Server -> File -> Table changes to Server -> Table. Also, you don’t need to move the data from one table to another. The drawback is that you have an additional service on your server. You may or may not be able to run it (depending on your organisation’s security policies and admin practices), and while it is open source, it is not officially supported by Microsoft.

Either way, collecting trace data allows you to create reports like the following, which can be a really nice way to track what’s happening on your servers:

Scripts:

[listyofiles folder=”wp-content/count_queries_scripts”]

What Exists and What is Empty in MDX

After reading my chapter “Managing Context in MDX” in MVP Deep Dives vol2 I noticed that I should have probably discussed one extra topic – the difference between cells which cannot exist, and such which can, but are empty.

The basic idea is that in SSAS cubes we have the notion of empty space. However, there is an important difference between empty intersections which are possible but result in nulls when queried and “impossible” intersections between hierarchies in the same dimension.

If we look at the Date dimension in Adventure Works we can see that we have month and year attributes. Months in 2007 appear only with the year 2007 in the dimension. Therefore, the combination between January 2005 and CY 2007 is not possible and consequentially it does not and cannot exist in our cube. In contrast, if we query for Clothing products in 2007 and we place the Month attribute on rows, we can see that there has been no Clothing items sold in the first few months of 2007:

Here we are dealing with possible, but empty cells – the January 2007 to June 2007 rows show empty intersections in the cube.

Why is this important? Well, it means that if we try to get the number of months with Clothing sales in 2007 with a query like:

we wrongly get 12, not 6. We need a function which can “detect” empty cells – not unnecessarily enforce the current context. An excellent function for this purpose is NonEmpty (or Exists):

Here we get the expected number – 6 (since only 6 months in 2007 have Internet Sales Amount against them).

A similar example can be shown the other way around. The following query returns 37, which is the total number of members of the month attribute with data against them:

This is because the NonEmpty function does not enforce the current context on its first argument and it gives us the members with data only (omitting NonEmpty results in 39, because we have two months with no sales whatsoever). Existing does, so if we add Existing to NonEmpty we get the expected count of 12 (as all months have had a sale in 2007 if we take all categories into account):

Here we eliminated the impossible intersections between months not in 2007 and year 2007.

SSAS Locale Identifier Bug

These days I have two little problems with SSAS. One is really small – I really don’t like how the Process dialog in BIDS (2008 R2) stays blank during the process operation and how I need to click on “Stop” after it finishes. The fact that such a bug had gotten into the product and has survived for so long when it happens 80-90% of the time doesn’t speak very well for the QA process at Microsoft. However, I can understand that the impact of this bug would have been deemed very insignificant as it impacts developers only and does not prevent them from doing their job. By the way, if you are also annoyed by the blank process dialog in the latest version of SQL Server, you will have to wait until the next release (2012) until you get a fix:

http://connect.microsoft.com/SQLServer/feedback/details/536543/ssas-process-progress-window-blank-no-details

The other bug is far more significant. It not only impair developers’ ability to build some features, but is also highly visible to all Excel users. However, it is hard(er) to reproduce:

http://connect.microsoft.com/SQLServer/feedback/details/484146/getting-an-error-when-trying-to-browse-a-cube

Still, it seems like developers from {[World].[Countries].Members}-{[World].[Countries].[USA]} hit it all the time. I am speaking about the Locale Identifier bug. The most common occurrence is when drilling through to detail in Excel. After the drill-through action has been initiated, Excel shows a message box with a message talking about the XML Parser and how the Locale Identifier cannot be overwritten –

“XML for Analysis parser: The LocaleIdentifier property is not overwritable and cannot be assigned a new value.”

The cause is simple (as confirmed by the SSAS team at Microsoft and Akshai Mirchandani in particular): once we open a session we cannot overwrite the locale. The mystery is around the cause for Excel to do something like that. Noting that Excel is not the only offender, as I have also seen the same error message thrown by SQL Server Profiler, and Chris Webb has seen it with the Cube Browser in BIDS:

http://connect.microsoft.com/SQLServer/feedback/details/484146/getting-an-error-when-trying-to-browse-a-cube

Since the bug has been reported to Microsoft, we are now hopeful that a fix will appear at some point. Until then you can try the following workaround, courtesy of my friends and former colleagues Matthew Ward and Paul Hales:

– Switch the Windows Region and Language Format in Control Panel to English (United States):

– Switch it back to whatever it was before

Now the “bug” should be fixed for that machine. For example, I had my new Windows 7 workstation configured to use Australian formats. After I got the Locale Identifier error message in Profiler, I switched the formats to USA. The bug disappeared and I could profile SSAS. After that I switched Windows back to the original English (Australia) format…and nothing broke. I could still use Profiler and drill-through in Excel.

Another notice. Greg Galloway has released a new version (0.7.4) of the OLAP Pivot Extensions add-in for Excel. In case you have been experiencing a Locale Identifier problem in SSAS while using older versions of the add-in, please download the new one and let Greg know (e.g. on the discussions page on CodePlex) if the new release fies your problem.

Thanks to everyone involved in confirming and testing different fixes. Ideally, I would like to see Microsoft fixing this on the server side of things which would allow us to easily patch up all existing systems exhibiting the problem.

Since all Microsoft Connect item related to this bug have been closed, I opened a new one, so you can vote in order to prioritise this issue:

https://connect.microsoft.com/SQLServer/feedback/details/721372/locale-identifier-bug

Note (2012-03-21): The issue seems to be with Windows Vista+ as confirmed by Michael Kaplan – http://blogs.msdn.com/b/michkap/archive/2010/03/19/9980203.aspx

Load Testing BI Solutions – When?

This year I came across two very different BI projects which had the common non-functional requirement to prove that they would handle an expected spike in the report generation load. Funny enough, in both cases the project teams got very concerned and came up with wildly inaccurate predictions of how many concurrent users we should be testing for. In the first case the problem was with the perception of “thousands of users”, while in the second, the team interpreted “monthly users” as “concurrent users”. The annoying part was that in the first case the team planned on building an ultra-massively overcomplicated queuing system to handle those spikes, and in the second case they were thinking of completely scrapping the ad-hoc functionality in the solution and resorting to report extracts distributed by email. The unreasonable expectations of the load lead to bad design choices – this is why it is important to remain calm and first check whether there is a problem at all.

Firstly, let’s agree that we are measuring report requests. To begin, we should know how many requests we get per a period of time (e.g. a month), and then how long it takes to generate a report. A typical scenario would be:

1,000,000 report requests per month
2 seconds to generate a report on average

What we need to do now if apply a bit of math:

1,000,000 / 20 = 50,000 requests per day (on average)

50,000 / 8 = 6,250 requests per hour (8 hours in a working day)

Since a report takes 2 seconds to generate, we can generate 1,800 reports in one hour. Therefore, with 6,250 requests, we would have 3.47 average concurrent users. Of course, this would be the case if we have a very uniformly split load. In reality this would not happen – instead, we will have peaks and dips in usage. A moderate peak is typically around 3x the average, while a heavy one would be at around 6x the average. To ensure that we can handle such peak periods, we should multiply our average concurrent users by 3 or by 6 depending on our load analysis. Let’s assume we have a very high peak load of 3.47 * 6 = 20.82, or approximately 21 concurrent users. This is the number we need to test in our case. Note that we had 1,000,000 report requests per month, but in our highest peak we expect to have only 21 concurrent users. I have not actually had a project where we have expected to have such a load (in both cases which prompted me to write this post we had between 2000-10000 users per month).

The moral of the story – don’t panic. In most reporting projects the user load is not high enough to warrant a full-scale load testing exercise; next time you hear talking about something like that, instead of rushing to cover unreasonable scenarios, try to calculate and confirm the need first.

DataMarket Updates: Speed, Portal and DateStream

It has been an eventful week for the Azure DataMarket. We had three new and exciting (for geeks like me) things happening in that corner of the Microsoft universe:

1. Speed!

There was an update to the Azure DataMarket a few days ago. It was, in my opinion, the best thing Microsoft could have done to their offering – tremendously increase its performance. While the DataMarket was previously plagued by unacceptably slow download speed, now it’s for feed standards blazingly fast. For comparison sake, I used to wait for more than 40 minutes when downloading an approximately 70k rows feed from the DataMarket prior to the update. Now, it is on my machine in around 5 – 8-fold increase in performance! Rumours have it that on faster-than-my-home-ADSL2+-networks we will be experiencing up to 20x better performance. It would be good to hear if this is actually correct for developers on such networks (please comment).

Next, range queries, hopefully…

2. Portal

While before the last couple of days anyone who wanted to publish data on the DataMarket had to contact the Microsoft team via email and ask how to get it done, we have just moved into the self-service space with a new portal allowing publishers to create and manage their feeds. The link to this new portal is:

https://publish.marketplace.windowsazure.com/

And, you can find some very helpful documentation about it here:

http://msdn.microsoft.com/en-us/library/windowsazure/hh563871.aspx

3. DateStream

Finally, I am proud to announce that the great DateStream feed got translated in four more languages:

– Hebrew and Danish – thanks to Rafi Asraf

– German

– Bulgarian

The Italian translation (thanks to Marco Russo) is coming soon too, but missed this release unfortunately.

Feel free to explore them and let me know if anything needs to be changed to make them more correct/useful.

SSAS: Multiple SQL Queries in ROLAP Mode

Just recently I was working on a project where I had to build a SSAS ROLAP cube on top of a badly built data mart. Badly built in this case meant one where we encounter multiple referential integrity (RI) issues. Most importantly, the designers ignored the very basic principle that all dimension keys for each row must be present in the respective dimension tables. When in MOLAP mode, SSAS checks for such mismatches during processing. However, when a partition is in ROLAP storage mode, we don’t get a notification that anything is wrong and the cube processing operation succeeds. This situation has some consequences during execution time and I will try to illustrate those in this post and show a solution. Before I begin, I must say that if it wasn’t for Akshai Mirchandani’s (from the Microsoft SSAS dev team) and Greg Galloway‘s help, I would have probably spent quite some time figuring out what is happening. Thanks to them the problem got solved quickly and I got to understand the reason for what is happening.

In terms of set-up, I created two tables in SQL Server: Dim and Fact. The Dim table contained two members A and B, with keys of 1 and 2. Initially, the Fact table had two rows referencing the Dim table – Dim keys of 1 and 2, and a measure column called Amount with 1.0 and 2.0 as the amounts corresponding to A and B. No issues here. After that I created a SSAS solution, corresponding to this simple dimensional model. I switched the partition storage for the cube to ROLAP and processed the SSAS database. After that I ran the following query, which I used for all subsequent examples:

The result was as expected:

At the same time I had a SQL Server Profiler trace running, which showed:

We can see that SSAS has executed one SQL query retrieving data from the fact table. Nothing unusual thus far.

To spoil the party, I added one more row to the fact table with a dimension key of 3 and Amount of 3. Since I did not add a row in the dimension table with a key of 3, this broke the rules and if I had a foreign key constraint implemented between the fact and the dimension tables I would not have been able to do this. After cleaning the SSAS cache, I ran my query again. The result:

The actual error was, of course, a missing key. I was not surprised when I saw this on my original project. However, looking at Profiler we see a “weird” sequence of events:

SSAS runs multiple queries which result in errors. In this case we can see four of these ExecuteSQL events. All of them are followed by an error in a ReadData event. In this particular case we can see only four ExecuteSQL events. In the real-world, this scenario can get multiple times worse (in my case we saw 4667 queries run against the relational database in a few minutes) leading to a really significant drop in performance.

So, what is happening? According to Akshai, SSAS encounters an error while dealing with the results from the initial SQL query and is trying to recover by sending more queries. In some cases this can result in getting the error in the result set only for some cells.

Luckily, there is an easy way out of this situation (thanks to Greg for providing the tips). SSAS can automatically create an “unknown bucket” for each dimension and can assign to it all measure values which do not correspond to a dimension member. To get this result, we must ensure that each affected partition’s error configuration is set to something similar to:

Note that the KeyErrorAction is ConvertToUnknown, not DiscardRecord (which is the alternative). This must also be coupled with setting up each “incomplete” dimension to include an Unknown member:

It does not matter whether the UnknownMember is Visible or Hidden, as long as it is not None.

Back to our scenario. After setting these properties on the dimension and the partition I processed the SSAS database again and executed the query. The result:

and the profiler trace:

As we can see we eliminated the multiple queries. If we do not want to see the Unknown amount in the cube we can use a scope assignment:

Coupled with making the UnknownMember Hidden, we can completely obliterate traces of our underlying RI issues. Unless our users check the numbers, but then we can blame whoever designed the datamart! 🙂