Merry Christmas and Happy New Year

I now have 40 articles on this blog, around 183 subscribers and approximately 1500-1600 visits a month. It’s been a very pleasant and rewarding casual occupation to write about the world of Microsoft Business Intelligence.

As my professional commitment for the new year I can promise that I will continue writing about new and interesting things I find while playing with SQL Server and the other great products which Microsoft puts on the market.

Also, recently I have started visiting the MSDN SQL Server forums and I will continue doing so in the next year. I would encourage everyone who reads this blog to start sharing their knowledge and helping new developers there – the technologies Microsoft offers are only as good as the people who implement them – us. Working together builds us as professionals and in the end makes our lives easier.

Many thanks to Nick Barclay for encouraging me to start this blog at the first place. Also, a big THANK YOU to Paul Hales, who even though hard to convince to start writing himself, has taught me quite a few things over the last couple of years and has always been the fiercest Microsoft advocate imaginable.

To Paul and Nick, to everyone who I have had the pleasure to work closely with (Tamzyn Bielecka, Mark Dasco, Maxim Yefremov, Sue Hourigan, Chris Mentor, Edmund Harvey and many others) and to all bloggers and readers – have a really MERRY Christmas and a very HAPPY New Year!!!

7 Ways to Process Analysis Services Objects

Being asked a bit too often how we can process Analysis Services databases (or cubes and dimensions) here is a list of 7 different methods:

1. Through the GUI

This one is obvious. We can do it through both SSMS and BIDS.

2. XMLA Script

To generate the script we can use the hefty Script button in SSMS. Simply configuring the processing settings and then instead of clicking the all too usual OK, we can as well click on the little button in the top left corner of the Process window:

xmla_script

Then, we can just execute the generated query.

3. SSIS Analysis Services Processing Task

This Control Flow task allows us to configure any settings and then add it to our ETL process. Quite handy.

image

4. SQL Server Agent Job

This one is really an automation of Method #2 – XMLA Script. We can encapsulate it into a job of SQL Server Analysis Services Command type:

image

5. .NET Code

This allows us to process cubes as a part of an application. Nice if we want to let our users process our cubes on-demand. Of course, better left to application developers, but still a good trick to know. Especially if we want to seem all-knowing when it comes to databases of any type. To achieve this objective, we use AMO (Analysis Management Objects). An API can be found here:

http://technet.microsoft.com/en-us/library/microsoft.analysisservices(SQL.90).aspx

6. Command Line – ascmd

The command line utility can do a lot – including processing SSAS objects. For a full readme you can go here:

http://msdn.microsoft.com/en-us/library/ms365187.aspx

7. Command Line – PowerShell

This PowerShell script will perform a Full Process of Adventure Works DW 2008 on localhost:

[Reflection.Assembly]::LoadWithPartialName(“Microsoft.AnalysisServices”)
$servername=New-Object Microsoft.AnalysisServices.Server
$servername.connect(“localhost”)
$databasename=New-Object Microsoft.AnalysisServices.Database
$databasename=$servername.Databases.GetByName(“Adventure Works DW 2008”)
$databasename.Process(“ProcessFull”)

Using AMO we can do any maintenance tasks through PowerShell, including an object process.

Probably not a fully exhaustive list, but I hope it helps with giving developers some options when it comes to this trivial and crucial part of the development and deployment process.

SQL Server DBMS Top 1 Wish List

As an addition to Teo Lachev’s Top 10 Wishlists (SSAS and SSRS), I would like to contribute only 1 item to a possible SQL Server DBMS wishlist:

1. Source Control.

Not SourceSafe source control, but rather an automated version out-of-the-box, not relying on developers to check in/out. Rather, it should track the changes to the code as they are made, and a full version history should be available directly in the DBMS. It should not be too hard. After all, there is a nice database available, which can store code with its version numbers just like anything else.

This would make a lot of developers’ lives a bit less frustrating.

OK, a SQL code “beautifier” would also be nice, but it is not all that important…

Vote on Connect

Filtering measures by indirectly related dimensions in MDX

I have lately started visiting the SQL Server MSDN Forums and trying to answer some questions about Analysis Services and Reporting Services. One of the questions about MDX queries seems to get repeated quite often and I will try to address it in this post, so hopefully more people will get to read this rather than ask about it on MSDN.

The actual question takes the form of:
“I have Dimension A and Dimension B, related to Measure 1. Dimension B is also related to Measure 2. How can I (is it possible to) get the values for Measure 1 filtered/sliced by Dimension A. I know it is easy to achieve with a join in SQL, but I do not know how to do it with MDX.

This suggest the following dimension model:

One solution would be creating a many-to-many relationship between Dimension A and Measure Group 2. However, we may want to avoid that for some reason and answer the problem with a query.

We can achieve the desired result in a number of ways but I will discuss the one using NONEMPTY. Others would be using FILTER and EXISTS.

A sample script is:

SELECT
{
[Measures].[Measure 2]
} ON 0,
NON EMPTY
{
NONEMPTY( [Dimension B].[Dimension B Hierarchy].Members,
([Measures].[Measure 1], [Dimension A].[Dimension A Hierarchy].&[Member_Key]))
} ON 1
FROM [Cube]

What this script does:

1. Gets all Dimension B members, which have associated cells for Measure 1 and the specific Dimension A member (which we are filtering/slicing by)
2. Gets the Measure 2 cells for the set of members retrieved in Step 1
3. Removes members from Step 1, for which cells from Step 2 are empty

An AdventureWorks example is:

SELECT
{
[Measures].[Internet Order Count]
} ON 0,
NON EMPTY
{
NONEMPTY( [Product].[Product].Members,
([Measures].[Reseller Order Count], [Reseller].[Reseller].&[238]))
} ON 1
FROM [Adventure Works]

Problems with FORMAT_STRING, VBA functions and NON_EMPTY_BEHAVIOR

Consider the following requirement: “We want our measure to be truncated to 4 decimal places without any trailing 0s after the decimal point and a comma as a thousands separator.”

First, let’s focus on the truncation part. If we want that to happen for a measure, we can do it through the following formula, as I have described previously:

SCOPE([Measures].[Our Measure]);
This = Fix([Measures].[Our Measure]*10^4)/10^4;
END SCOPE;

This takes care of our truncation. So far so good.

Now let’s have a look at the formatting. If we want to apply custom formatting through FORMAT_STRING for a number such as 12345.1234, which states: “#,0.####”, in order to obtain 12,345.1234 we run into a problem. The same FORMAT_STRING expression applied to 12345 gives us 12,345. – including a trailing decimal point (in our case a trailing period). There is no way to get rid of it through FORMAT_STRING. Even in the specifications for FORMAT_STRING it is pointed out that:

If the format expression contains only number sign (#) characters to the left of the period (.), numbers smaller than 1 start with a decimal separator.

This holds true for #s to the right of the period, as well.

What we can do in this case is either conditionally format the number, with some sort of a rule, which checks if the number is whole or not (I will avoid that), or we can use the VBA Format function like this:

SCOPE([Measures].[Our Measure]);
Format([Measures].[Our Measure], “#,0.####”);
END SCOPE;

This yields the correct result, so in the end, we can do:

SCOPE([Measures].[Our Measure]);
This = Format(Fix([Measures].[Our Measure]*10^4)/10^4, “#,0.####”);
END SCOPE;

That may be achieveing the result, but let’s look at perfromance of a calculated measure utilising this approach. The first thing I noticed when I implemented this is the huge increase in query processing time and the large number of 0 valued cells. It turned out that the VBA functions in MDX do not skip empty (NULL) cells. If you try Fix(NULL), you’ll get 0. So, after Fix-ing our measure, we get 0s for every empty cell in our cube. The same is valid for Format.

Next step was trying to find a way to skip these empties. I tried:

SCOPE([Measures].[Our Measure]);
This = Format(Fix([Measures].[Our Measure]*10^4)/10^4, “#,0.####”);
NON_EMPTY_BEHAVIOR(This) = [Measures].[Our Measure];
END SCOPE;

but it did not work. I still got the 0s in my result set. I suppose that it got ignored by SSAS. Because of this issue I decided to write an IIF statement like this:

SCOPE([Measures].[Our Measure]);
This = IIF([Measures].[Our Measure] = 0, NULL, Format(Fix([Measures].[Our Measure]*10^4)/10^4, “#,0.####”));
END SCOPE;

This also worked. However, now we have an IIF statement, which serves no purpose other than filtering our empty cells, because of the VBA functions’ behavior. It would be interesting if there is any other way of implementing this, avoiding the problem. It would be nice if:

1. VBA functions can skip empty cells
2. FORMAT_STRING behaves just like Format
3. NON_EMPTY_BEHAVIOR actually works with SCOPE (in queries)

Please comment if you have a solution for the problems above and let me know if you would like to see the above issues fixed (I am considering raising a feedback/recommendation issue on Connect).

Data Mystification Techniques

I am a fan of studies in Data Visualisation. It is a creative and dynamic field with a lot of room for experiment. I am considering report and dashboard design, and within this frame Data Visualisation, as a form of practical art. Well designed and built reports are critical for solution adoption and usability. However, in this post I will concentrate on exactly the opposite topic – intentionally mystifying reports, obscuring the data and making it hard, for the report consumers to reach informed conclusions. I will also show how we can make the data visualisations as misleading as possible. This post is not as abstract as one may think, as it draws its examples from a very real project, for which I had to build a report under heavy pressure.

Initially, I was asked to build a report based on a small data set (~450 rows) stored in an Excel workbook. The data was perfectly suitable for building a Pivot Table on top of it, so I did so and then I decided to use the pivot table as a source for my report. The users have one measure – Spending Amount and a few dimensions – Category and Focus for that spending. The request came in the form: “We want to see the Amount by Category and Focus in both graphical and numeric form“. So, I sat down and produced this prototype (with their colour theme):

As you can see from the screenshot, the report is fairly simple – the bar graphs on the top clearly show how the Amount is distributed per Category and Focus. Also, because of an explicit request, I build the bar graph on the second row to show category and focus amounts combined, and in order to clarify the whole picture, I added the table at the bottom of the report. The report works for colour-blind people too, as the details for the Focus per Category Expenditure bar graph are shown directly below in the data table.

I sent the report prototype for review, and the response was:

1. Remove the table.
2. Change the bar graphs to make them more “flat”. Meaning: there is too much difference between the bars.

The reasoning – it is “too obvious” that the data is unevenly distributed. As the report is supposed to be presented to higher level management, discrepancies in the amount allocation would “look bad” on the people who requested me to create the report at the first place.

Adding more fake data got rejected, so I was advised to prepare a new prototype report with the new requirements. It follows:

Now, Stephen Few would spew if presented with such a report. I did everything I could to obscure the report – added 3D effects, presented the amount in 100% stacked graphs when absolutely not necessary and, of course, I had to add a Pie Chart. A 3D Pie Chart. The whole report is totally useless. It is impossible to deduct the actual numbers of the various category/focus amounts. Furthermore, the Pie Chart combines the focus and category and uses virtually indistinguishable colours for the different slices. The 3D effect distorts the proportions of these slices and if printed in Black and White, or if viewed by a colour-blind person, the report looks like this:
Since my goal of total mystification of the report was achieved, I sent the second prototype back.
The response was: “It looks much better, and we really like the top right bar graph and the Pie Chart. Would it be possible to leave just those two and change the Pie Chart to show data broken down just by Category?

So, the users did not like my combined series. A point for them. Then I decided to remove the 3D cone graph, to remove all 3D effects, to make it more readable and to create the following 3rd prototype:

Here, we can see that the pie has the actual percentages displayed, and each category amount can be calculated from there. The stacked bar graph is again almost useless.

The users liked it, but still thought that it was too revealing, and were particularly concerned with the fact that there are a couple of “invisible” categories (the ones with small percentages) on the Pie Chart. They had a new suggestion – change the pie chart back to a bar graph and play with the format, so that even the smallest amount is visible. I offered an option, made another prototype and it finally got approved. The exact words were: “The graphs are exactly what we want“. There it is:

The Y Axis in the first graph is interesting. Not only that there is a “scale break”, but in fact the scale is totally useless, as it is manually adjusted, because of the big gaps between the amounts. I needed two scale breaks, which for a series with 6 values is a bit too much. You can compare the normal linear scale of the first Prototype I created and this one. However, it hit the goal – my users were satisfied.

I consider this effort to be contrary to be an exercise in “Data Mystification”. It is very easy to draw the wrong conclusions from the report, and it achieves the opposite to the goals of Business Intelligence, as instead of empowering users, it could actually confuse them and lead them to making wrong decisions.

A new BI company

Just a quick one – recently I learned that three former colleagues of mine had formed a new BI company in Australia – Naked Data. What makes it slightly different is that it is based on an open-source platform; and what makes this venture interesting for me is the people who are involved. All of them are exceptional professionals and I believe that if anyone can make it it is them.

I am curious!