Range Queries with Azure DataMarket Feeds

By default the Azure DataMarket does not allow range queries. In fact, the only way we can filter a data feed is through specifying one or more values for the “queryable” fields specified for it. There is not technical reason behind not allowing range queries as both the back-end (presumably SQL Azure, or SQL Server) and the OData protocol support them. Fortunately, there is a way to consume a range of the values in a column of a data feed in PowerPivot. It is not straight-forward and I do not think that the target audience of both self-service BI/PowerPivot and the DataMarket itself would appreciate the complexity, but it could be useful anyway.

If we want to pull all three tables from the DataMarket we can simply use https://api.datamarket.azure.com/BoyanPenev/DateStream/ as the URL in PowerPivot:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Otherwise, we can pick each one with a URL like (for the BasicCalendarEngish table):

https://api.datamarket.azure.com/Data.ashx/BoyanPenev/DateStream/BasicCalendarEnglish

If we filter the data set on the DataMarket website to request only the data for 2010 we get the following URL:

https://api.datamarket.azure.com/Data.ashx/BoyanPenev/DateStream/BasicCalendarEnglish?$filter=YearKey%20eq%202010

Note the last bit:

?$filter=YearKey%20eq%202010

This is simply the URL/OData encoded ?$filter=YearKey = 2010

In OData we can also use other operators, not just = (or eq). For ranges these are gt (greater than), ge (greater than or equal to), lt (less than) and le (less than or equal to). We can also use and and or operators to combine different predicates. For a more thorough list, please refer to http://www.odata.org/developers/protocols/uri-conventions. If we replace the ” = 2010″ with ” < 2010″ and then encode the URL, we do indeed get all years prior to 2010. Things get slightly more complicated when we have a more complex scenario. In example, when building a date table we may want to include all years between 2000 and 2030. To do that, we would have to write something like:

https://api.datamarket.azure.com/Data.ashx/BoyanPenev/DateStream/BasicCalendarEnglish?$filter=YearKey >= 2000 and YearKey <= 2030

encoded, the same looks like this:

https://api.datamarket.azure.com/Data.ashx/BoyanPenev/DateStream/BasicCalendarEnglish?$filter=YearKey%20ge%202000%20and%20YearKey%20le%202030

Here space is %20 and the math comparison operators have been replaced with the OData operators (in red).

If we paste this in PowerPivot and hit “Next”:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

…we get exactly what we expect – a table with 30 years.

Things get more complicated if we include the datetime DateKey in the URL. For a single date (e.g. 1900-01-01), we have to use:

https://api.datamarket.azure.com/Data.ashx/BoyanPenev/DateStream/BasicCalendarEnglish?$filter=DateKey = datetime’1900-01-01T00:00:00′

After Applying URL encoding we get:

https://api.datamarket.azure.com/Data.ashx/BoyanPenev/DateStream/BasicCalendarEnglish?$filter=DateKey%20eq%20datetime%271900-01-01T00%3a00%3a00%27

Where %27 is apostrophe and %3a is a colon (for a list of ASCII characters and their URL encoded form we can refer to http://www.w3schools.com/tags/ref_urlencode.asp).

Now, to combine the two we would need to write:

https://api.datamarket.azure.com/Data.ashx/BoyanPenev/DateStream/BasicCalendarEnglish?$filter=DateKey = datetime’1900-01-01T00:00:00′ or (YearKey >= 2000 and YearKey <= 2030)

Encoded this becomes:

https://api.datamarket.azure.com/Data.ashx/BoyanPenev/DateStream/BasicCalendarEnglish?$filter=DateKey%20eq%20datetime%271900-01-01T00%3a00%3a00%27%20or%20%28YearKey%20ge%202000%20and%20YearKey%20le%202030%29

This monstrous-to-write URL string returns 30 years of data + 1 day.

I suppose this approach can be classified as a workaround, as I have not seen any documentation on PowerPivot referring to any options for filtering data from the Azure DataMarket. However, in my opinion, this should be a feature of the DataMarket itself as it would make it easier/possible for users with any tool to get just the data they need and even possibly reduce the load on the site service since it will no longer be necessary to export everything and then attempt to apply a filter.

Introducing Project DateStream (CodePlex)

I recently blogged about The Case for an Azure DataMarket Date Table. I finished the blog post with a bit of a critique of the DataMarket team at Microsoft, which I can now wholeheartedly apologise for. This is because since my last post I was contacted by Max Uritsky who is a Group Program Manager on the Windows Azure Marketplace DataMarket team (a long name for a team, yes). He and Belinda Tiberio managed to help me with creating and hosting a new Date feed. Not only they helped with making it available for free on the DataMarket website, but also gave me a 1Gb free SQL Azure database for the project. A big “thank you” goes to Julie Strauss from the SSAS team for making the contact, as well.

To summarise, the DateStream project is a free date table available as a feed and intended to be used by PowerPivot BI users. As most, if not all, BI projects include a date table, the goal is to provide an easy-to-use, correct and simple mechanism for creating such tables.

After some deliberations on the format of the feed we decided that it would be best to split the feed in a number of localised Basic tables and one Extended table. Currently we have only two Basic (US and EN) versions and a fairly straight-forward Extended one. However, the plan is to inlclude more and more column in the Extended one and provide extra localised (Italian, French, Spanish, German, Russian, etc.) tables. When I am saying “we” I mean fellow SSAS professionals, which I had the pleasure to discuss this idea (among which Marco Russo, Thomas Kejser, Willfried Faerber and Grant Paisley).

The CodePlex page of the project contains more details about the feed and the tables and also allows for commenting on existing features, as well as requesting new ones. It can be found at http://datestream.codeplex.com/.

The actual feed can be used directly from https://datamarket.azure.com/dataset/1542c52d-d466-4094-a801-9ef8fd7c6485.

Note the logo – it was created by Daniele Perilli from SQL BI with the assistance provided by Marco Russo. Thanks to them we did not have to resort to my graphic design skills, which definitely is a win for the DataMarket website.

One note – please let me know if the performance you get from the feed is not satisfactory (please remember that once pulled, the feed does not need to be refreshed as the data will remain valid forever). If many people agree that it is too slow I could potentially host it on an alternative location as well. It is possible to download a CSV version from the DataMarket as a workaround, which also allows removing unnecessary date ranges.