Load Testing BI Solutions – When?

This year I came across two very different BI projects which had the common non-functional requirement to prove that they would handle an expected spike in the report generation load. Funny enough, in both cases the project teams got very concerned and came up with wildly inaccurate predictions of how many concurrent users we should be testing for. In the first case the problem was with the perception of “thousands of users”, while in the second, the team interpreted “monthly users” as “concurrent users”. The annoying part was that in the first case the team planned on building an ultra-massively overcomplicated queuing system to handle those spikes, and in the second case they were thinking of completely scrapping the ad-hoc functionality in the solution and resorting to report extracts distributed by email. The unreasonable expectations of the load lead to bad design choices – this is why it is important to remain calm and first check whether there is a problem at all.

Firstly, let’s agree that we are measuring report requests. To begin, we should know how many requests we get per a period of time (e.g. a month), and then how long it takes to generate a report. A typical scenario would be:

  • 1,000,000 report requests per month
  • 2 seconds to generate a report on average

What we need to do now if apply a bit of math:

1,000,000 / 20 = 50,000 requests per day (on average)

50,000 / 8 = 6,250 requests per hour (8 hours in a working day)

Since a report takes 2 seconds to generate, we can generate 1,800 reports in one hour. Therefore, with 6,250 requests, we would have 3.47 average concurrent users. Of course, this would be the case if we have a very uniformly split load. In reality this would not happen – instead, we will have peaks and dips in usage. A moderate peak is typically around 3x the average, while a heavy one would be at around 6x the average. To ensure that we can handle such peak periods, we should multiply our average concurrent users by 3 or by 6 depending on our load analysis. Let’s assume we have a very high peak load of 3.47 * 6 = 20.82, or approximately 21 concurrent users. This is the number we need to test in our case. Note that we had 1,000,000 report requests per month, but in our highest peak we expect to have only 21 concurrent users. I have not actually had a project where we have expected to have such a load (in both cases which prompted me to write this post we had between 2000-10000 users per month).

The moral of the story – don’t panic. In most reporting projects the user load is not high enough to warrant a full-scale load testing exercise; next time you hear talking about something like that, instead of rushing to cover unreasonable scenarios, try to calculate and confirm the need first.

DataMarket Updates: Speed, Portal and DateStream

It has been an eventful week for the Azure DataMarket. We had three new and exciting (for geeks like me) things happening in that corner of the Microsoft universe:

1. Speed!

There was an update to the Azure DataMarket a few days ago. It was, in my opinion, the best thing Microsoft could have done to their offering – tremendously increase its performance. While the DataMarket was previously plagued by unacceptably slow download speed, now it’s for feed standards blazingly fast. For comparison sake, I used to wait for more than 40 minutes when downloading an approximately 70k rows feed from the DataMarket prior to the update. Now, it is on my machine in around 5 – 8-fold increase in performance! Rumours have it that on faster-than-my-home-ADSL2+-networks we will be experiencing up to 20x better performance. It would be good to hear if this is actually correct for developers on such networks (please comment).

Next, range queries, hopefully…

2. Portal

While before the last couple of days anyone who wanted to publish data on the DataMarket had to contact the Microsoft team via email and ask how to get it done, we have just moved into the self-service space with a new portal allowing publishers to create and manage their feeds. The link to this new portal is:

https://publish.marketplace.windowsazure.com/

And, you can find some very helpful documentation about it here:

http://msdn.microsoft.com/en-us/library/windowsazure/hh563871.aspx

3. DateStream

Finally, I am proud to announce that the great DateStream feed got translated in four more languages:

Hebrew and Danish – thanks to Rafi Asraf

German

Bulgarian

The Italian translation (thanks to Marco Russo) is coming soon too, but missed this release unfortunately.

Feel free to explore them and let me know if anything needs to be changed to make them more correct/useful.