DataMarket Updates: Speed, Portal and DateStream

It has been an eventful week for the Azure DataMarket. We had three new and exciting (for geeks like me) things happening in that corner of the Microsoft universe:

1. Speed!

There was an update to the Azure DataMarket a few days ago. It was, in my opinion, the best thing Microsoft could have done to their offering – tremendously increase its performance. While the DataMarket was previously plagued by unacceptably slow download speed, now it’s for feed standards blazingly fast. For comparison sake, I used to wait for more than 40 minutes when downloading an approximately 70k rows feed from the DataMarket prior to the update. Now, it is on my machine in around 5 – 8-fold increase in performance! Rumours have it that on faster-than-my-home-ADSL2+-networks we will be experiencing up to 20x better performance. It would be good to hear if this is actually correct for developers on such networks (please comment).

Next, range queries, hopefully…

2. Portal

While before the last couple of days anyone who wanted to publish data on the DataMarket had to contact the Microsoft team via email and ask how to get it done, we have just moved into the self-service space with a new portal allowing publishers to create and manage their feeds. The link to this new portal is:

https://publish.marketplace.windowsazure.com/

And, you can find some very helpful documentation about it here:

http://msdn.microsoft.com/en-us/library/windowsazure/hh563871.aspx

3. DateStream

Finally, I am proud to announce that the great DateStream feed got translated in four more languages:

Hebrew and Danish – thanks to Rafi Asraf

German

Bulgarian

The Italian translation (thanks to Marco Russo) is coming soon too, but missed this release unfortunately.

Feel free to explore them and let me know if anything needs to be changed to make them more correct/useful.

Range Queries with Azure DataMarket Feeds

By default the Azure DataMarket does not allow range queries. In fact, the only way we can filter a data feed is through specifying one or more values for the “queryable” fields specified for it. There is not technical reason behind not allowing range queries as both the back-end (presumably SQL Azure, or SQL Server) and the OData protocol support them. Fortunately, there is a way to consume a range of the values in a column of a data feed in PowerPivot. It is not straight-forward and I do not think that the target audience of both self-service BI/PowerPivot and the DataMarket itself would appreciate the complexity, but it could be useful anyway.

If we want to pull all three tables from the DataMarket we can simply use https://api.datamarket.azure.com/BoyanPenev/DateStream/ as the URL in PowerPivot:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Otherwise, we can pick each one with a URL like (for the BasicCalendarEngish table):

https://api.datamarket.azure.com/Data.ashx/BoyanPenev/DateStream/BasicCalendarEnglish

If we filter the data set on the DataMarket website to request only the data for 2010 we get the following URL:

https://api.datamarket.azure.com/Data.ashx/BoyanPenev/DateStream/BasicCalendarEnglish?$filter=YearKey%20eq%202010

Note the last bit:

?$filter=YearKey%20eq%202010

This is simply the URL/OData encoded ?$filter=YearKey = 2010

In OData we can also use other operators, not just = (or eq). For ranges these are gt (greater than), ge (greater than or equal to), lt (less than) and le (less than or equal to). We can also use and and or operators to combine different predicates. For a more thorough list, please refer to http://www.odata.org/developers/protocols/uri-conventions. If we replace the ” = 2010″ with ” < 2010″ and then encode the URL, we do indeed get all years prior to 2010. Things get slightly more complicated when we have a more complex scenario. In example, when building a date table we may want to include all years between 2000 and 2030. To do that, we would have to write something like:

https://api.datamarket.azure.com/Data.ashx/BoyanPenev/DateStream/BasicCalendarEnglish?$filter=YearKey >= 2000 and YearKey <= 2030

encoded, the same looks like this:

https://api.datamarket.azure.com/Data.ashx/BoyanPenev/DateStream/BasicCalendarEnglish?$filter=YearKey%20ge%202000%20and%20YearKey%20le%202030

Here space is %20 and the math comparison operators have been replaced with the OData operators (in red).

If we paste this in PowerPivot and hit “Next”:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

…we get exactly what we expect – a table with 30 years.

Things get more complicated if we include the datetime DateKey in the URL. For a single date (e.g. 1900-01-01), we have to use:

https://api.datamarket.azure.com/Data.ashx/BoyanPenev/DateStream/BasicCalendarEnglish?$filter=DateKey = datetime’1900-01-01T00:00:00′

After Applying URL encoding we get:

https://api.datamarket.azure.com/Data.ashx/BoyanPenev/DateStream/BasicCalendarEnglish?$filter=DateKey%20eq%20datetime%271900-01-01T00%3a00%3a00%27

Where %27 is apostrophe and %3a is a colon (for a list of ASCII characters and their URL encoded form we can refer to http://www.w3schools.com/tags/ref_urlencode.asp).

Now, to combine the two we would need to write:

https://api.datamarket.azure.com/Data.ashx/BoyanPenev/DateStream/BasicCalendarEnglish?$filter=DateKey = datetime’1900-01-01T00:00:00′ or (YearKey >= 2000 and YearKey <= 2030)

Encoded this becomes:

https://api.datamarket.azure.com/Data.ashx/BoyanPenev/DateStream/BasicCalendarEnglish?$filter=DateKey%20eq%20datetime%271900-01-01T00%3a00%3a00%27%20or%20%28YearKey%20ge%202000%20and%20YearKey%20le%202030%29

This monstrous-to-write URL string returns 30 years of data + 1 day.

I suppose this approach can be classified as a workaround, as I have not seen any documentation on PowerPivot referring to any options for filtering data from the Azure DataMarket. However, in my opinion, this should be a feature of the DataMarket itself as it would make it easier/possible for users with any tool to get just the data they need and even possibly reduce the load on the site service since it will no longer be necessary to export everything and then attempt to apply a filter.

The Case for an Azure DataMarket Date Table

Since the release of PowerPivot Excel pros and power users have been encouraged to learn and “play” with the add-in. There is one little thing from the world of BI which we (as long-standing BI professionals) are used to but apparently troubles our new friends from the Excel world – the Date table. In the SSAS Multidimensional world we have BIDS which can generate a date table in a variety of formats. Still, much more common is the custom Date table, which we build through a SQL script as it enables us to dynamically generate it for a range of dates. I have also used Excel in the past for quick and dirty solutions. Although all these scenarios are very “workable” for database professionals, when it comes to Excel power-users fiddling with databases is far from ideal. Luckily, there is a better way.

Azure DataMarket

The answer I am proposing is the new DataMarket, which as an added bonus (in cases when we cannot use third-party products) is Microsoft-owned. It is integrated very well within PowerPivot since the last update and allows selecting subsets of the data (e.g. we do not need to import everything available in the data set). Because the data is in a feed format we can connect and pull data we want anytime.

The DataMarket is the vehicle but it needs a good data set to transport. With the Microsoft SSAS team being busy with new releases and unable to chase this up with the Azure DataMarket team, I tried contacting the latter directly to no avail. It would be very simple to create a sample, test and if all goes well – we could easily expand the feed to include lots of necessary columns which could simplify any PowerPivot implementation. As a start, a minimum of a calendar hierarchy with 4-5 levels should suffice, but the possibilities are very exciting. We could have multiple financial/fiscal calendars, public holidays, weekends, leap years taken care of and many other Date properties built right into the feed. If customisation is required (as it probably will be in many cases), PowerPivot developers have the Excel and DAX to play with the data through formulas and change various properties like member names and formats.

In my opinion with a miniscule development effort Microsoft can win on both new fronts – PowerPivot and the Azure DataMarket. After all, both products need more exposure and a popular Date feed will definitely help in this direction (not to mention how much easier it would be for developers to “get into” PowerPivot-based BI implementations).

PS: I have emailed the DataMarket team a few months ago and I received no reply. This remains so even after Julie Strauss from the SSAS team followed up on this – it seems like someone is sleeping on the job…

PPS: I just created a Microsoft Connect suggestion – please vote if you feel like this is a good idea. Also, if you feel even more inclined to act you can email the DataMarket guys directly asking them to pay attention through the links provided on their Contact Us page.