June 2011 – Boyan Penev on Microsoft BI

Colour Codes in SSAS

When we apply colour formatting to measures in SSAS we get a “strange” colour code for our expressions from the BIDS colour-picker. For example, if we want to colour a certain measure in Red, we can write:

CREATE MEMBER CURRENTCUBE.[Measures].[Red Measure] AS
This = 1,
FORE_COLOR=255;

If we use the colour picker we can get more complex numbers for more complex colours. If we want to apply the colour with RGB of (128,128,192), the colour picker generates the code 12615808 for use in the FORE_COLOR function.

Constructing our own color code is not all that hard. In fact, all we have to do is the following:

1. Multiply the Blue decimal code by 65536 (which is 2^16)
2. Multiply the Green decimal code by 256 (yes, 2^8)
3. Add the numbers from step 1 and 2 and the decimal code for Red

In other words, we can use the following formula:

c = r + 256*g + 65536*b

Where c is the colour code we need and r,g,b are the decimal codes (between 0 and 255) for each colour channel.

If we apply this to our (128,128,192) colour, we get:

c = 128 + 256*128 + 65536*192 = 128 + 32768 + 12582912 = 12615808

And this is exactly what we get from the colour picker control in BIDS.

As Chris Webb points out in the comments section below, we can also write FORE_COLOR=RGB(128,128,192) instead of calculating the colour code manually, which is way more practical. 🙂

I wish such little bits and pieces are better documented on MSDN as the current page in regards to the BACK_COLOR and FORE_COLOR properties is fairly basic: http://msdn.microsoft.com/en-us/library/ms145991.aspx.

UPDATE (01/07/2011):

Considering that RGB() is a VBA function, I tested the performance we get from using it. I have a simple measure, which I am formatting with an IIF expression. I tested three scenarios – with no FORE_COLOR formatting, with FORE_COLOR and numeric codes, and FORE_COLOR with RGB function. Each query was run 3 times on warm cache and returned a result set of 500,000+ cells. The results were as follows:

No FORE_COLOR: 2.3 seconds

FORE_COLOR=12615808: 5.3 seconds

FORE_COLOR=RGB(128,128,192): 17.7 seconds

In other words, applying a VBA function for colour formatting severely impacts the performance of our queries. Thus, if you really need to format the colour of your measures, you may be better off doing it the hard way by converting the RGB values to the colour codes the same way the colour-picker control does in BIDS.

UPDATE2 (04/07/2011):

As Valentino Vranken commented below, you can also use hex codes for the colours. However, there is a small twist. The codes need to be in BGR, not in RGB format. So, Red becomes 0000FF. The actual format of the hex expression is FORE_COLOR=”&Hxxxxxx“. For Red we can use FORE_COLOR=”&H0000FF”. There is no performance difference between hex and decimal codes, so it is up to your preference whether you go with one or the other. Converting 0000FF to decimal also shows the same decimal number (255) as performing the math above and if you already know the BGR hex you can simply convert it to a decimal.

Database Compression Effects on SSAS Processing

Just recently I was optimising the processing time for a rather large cube on my laptop. The hardware I used was not tremendously powerful (a mobile i5 CPU, 8Gb RAM and one 7200 RPM hard disk) and having a 500M rows fact table proved to be somewhat of a challenge for it.

The set up was fairly typical – a large fact table, with a few dimensions in a typical star schema. The goal was to get it to process as fast as possible.

After ensuring all best-practice advice from Microsoft (and in particular SQL CAT) was followed, I managed to get the processing down from 7381.075 seconds to 5801.536 s (approximately 21% improvement). However, one thing still did not look quite right – the CPU utilisation during the partition processing phase.

Considering my hardware configuration it was easy to deduce that the HDD would be more of a bottleneck than the CPU. There was only one HDD and everything, including the source database and the SSAS files, was on it. During the partition processing reading and writing is happening simultaneously, which is not good for performance. At the same time in my case the CPU utilisation was on around 75% with 25% to spare, which was not great. Ideally, I would like to see my CPU on 100%, or very close to that to ensure that I am utilising all of its power. However, I had no way to add more disks.

Luckily, in SQL Server 2008 (Enterprise Edition) we have the option to compress database tables, or partitions. We do get a performance hit on the CPU when we (de)compress the data, but it takes less space on the disk and thus requires less IOPS to read it. It sounded like a perfect fit for my problem. After estimating the benefits in terms of size row and page compression gave me, I decided to go with the heavier option since even its marginally better (5%) reduction of size over row-level compression was desirable. After waiting patiently for the compression operation to finish, the overall database size did drop to around 30% as predicted.

I run my SSAS processing script again and it completed in 4805.408 seconds – approximately 17% better than the previous run, which is a very significant improvement. The CPU utilisation went up to around 97%, which also meant that my blunder with the page vs. row compression produced, generally, good results.

One thing we must always consider is whether the bottleneck is in the IO. A few months later I replaced my hard disk with a new SSD (OSZ Vertex 2), which increased the IO performance of my laptop dramatically. When I tested SSAS processing tasks I could almost always see my CPU steady at 100% without any compression. If I were to compress my fact tables I would expect to get worse performance because my CPU would struggle to both decompress and process the data at the same time.

In conclusion, you must know what your bottleneck is before you opt for a solution like compression. It comes at a price, but the price may well be worth paying if your system is not well balanced.

Debugging SSAS Cubes – a Few Tips

There are a few techniques which can contribute to this topic and I will list some tips which seem to elude many people experiencing problems with their implementations.

1. Do not use the Cube Browser and Excel for troubleshooting

Yes, it is convenient to just drag-drop some hierarchies and measures in the Cube Browser, or in Excel, but when you have troubles with your results these tools will obscure the results. This is happening because every client tool has to generate MDX and by doing so they often generalise the approach and add “nice” language constructs like VisualTotals, or always use a subselect clause. There is nothing wrong with checking the end-result of your work with tools showing what your users see, but when your calcs don’t behave the way you expect (especially when filtering things), I would definitely stay away from GUI clients. Instead you can write some MDX in SSMS and pull exactly what you need in a clean and straight-forward way. We can play with swapping WHERE clauses for subselects, add calculations directly in the queries, etc – in brief, we get the full calculation power of MDX in our hands for troubleshooting. Additionally, next time you ask a question someone else you can start with “I have this MDX query…” instead of: “I have this pivot grid in Product X…” – it is much more likely that you will get a meaningful response.

2. Use SQL Server Profiler

If your MDX works perfectly fine against your cube, but when you use Excel 20xx you get funny results, there is no better way to find what the problem is than to run a trace with SQL Server Profiler, capture the MDX the tool has generated and then work with that to debug – again in SSMS. Different versions of Excel generate different MDX and the difference can be very important. When using other tools the differences become even larger. The only way to see what your tools does (other than talk to the people who have built it) is to capture its SSAS queries.

3. Ask for help

Once you have arrived at the conclusion that you need help (e.g. after exhaustively searching for the problem online), you can quickly get an answer at a few forums. My preferred one is MDSN, which has a dedicated SSAS section. Also, you can ask at the www.ssas-info.com forum section, or even at StackOverflow. Either way, you will need to know what the versions of each product you are using are (especially SQL Server version, service packs and updates applied) and ideally – post your MDX (if you have 10 pages of queries it would help to somehow abbreviate the repeating sections instead of copy-pasting the lot). Do not also forget that the people who are reading your questions do not know your dimensional model and what features of SSAS you are using (e.g. LastNonEmpty, unary operators, etc.) – make sure that you describe any specifics – otherwise everyone would assume a typical star with a straight-forward implementation in SSAS. These approaches, together with being courteous and good with spelling and grammar, will help attract fellow developers, who may know a little SSAS quirk, or MDX feature which resolves your issue.

4. Use Adventure Works

Replicating you problems in Adventure Works is another great way to ensure you get an accurate answer. Because typically everyone on the forums has access to Adventure Works, showing some MDX which results in an unexpected for you outcomes is the best way to pinpoint the problem eliminating all question marks which your specific model and data could be introducing. Additionally, you will probably get an answer which includes some sample code and you will be able to run it immediately and see the results.

The Case for an Azure DataMarket Date Table

Since the release of PowerPivot Excel pros and power users have been encouraged to learn and “play” with the add-in. There is one little thing from the world of BI which we (as long-standing BI professionals) are used to but apparently troubles our new friends from the Excel world – the Date table. In the SSAS Multidimensional world we have BIDS which can generate a date table in a variety of formats. Still, much more common is the custom Date table, which we build through a SQL script as it enables us to dynamically generate it for a range of dates. I have also used Excel in the past for quick and dirty solutions. Although all these scenarios are very “workable” for database professionals, when it comes to Excel power-users fiddling with databases is far from ideal. Luckily, there is a better way.

Azure DataMarket

The answer I am proposing is the new DataMarket, which as an added bonus (in cases when we cannot use third-party products) is Microsoft-owned. It is integrated very well within PowerPivot since the last update and allows selecting subsets of the data (e.g. we do not need to import everything available in the data set). Because the data is in a feed format we can connect and pull data we want anytime.

The DataMarket is the vehicle but it needs a good data set to transport. With the Microsoft SSAS team being busy with new releases and unable to chase this up with the Azure DataMarket team, I tried contacting the latter directly to no avail. It would be very simple to create a sample, test and if all goes well – we could easily expand the feed to include lots of necessary columns which could simplify any PowerPivot implementation. As a start, a minimum of a calendar hierarchy with 4-5 levels should suffice, but the possibilities are very exciting. We could have multiple financial/fiscal calendars, public holidays, weekends, leap years taken care of and many other Date properties built right into the feed. If customisation is required (as it probably will be in many cases), PowerPivot developers have the Excel and DAX to play with the data through formulas and change various properties like member names and formats.

In my opinion with a miniscule development effort Microsoft can win on both new fronts – PowerPivot and the Azure DataMarket. After all, both products need more exposure and a popular Date feed will definitely help in this direction (not to mention how much easier it would be for developers to “get into” PowerPivot-based BI implementations).

PS: I have emailed the DataMarket team a few months ago and I received no reply. This remains so even after Julie Strauss from the SSAS team followed up on this – it seems like someone is sleeping on the job…

PPS: I just created a Microsoft Connect suggestion – please vote if you feel like this is a good idea. Also, if you feel even more inclined to act you can email the DataMarket guys directly asking them to pay attention through the links provided on their Contact Us page.