Last week I had the pleasure to meet a friend of mine, who formed a company I wrote about a year or two ago. His business has grown nicely since then and they have become the number one PALO partner in Australia. For those who are not aware of Jedox and PALO, I would recommend visiting their website at www.jedox.com – it is an open source BI suite very similar to SQL Server, minus the relational part. Since I was given a private show (no, nothing immoral here) in their corporate setup, I thought it may be interesting to discuss what I saw here in this post.
There are a few interesting and vastly different aspects of PALO when compared to the SQL Server BI stack:
For me the best feature they have is the General-Purpose GPU support in the OLAP server. While the OLAP components can be queried through MDX much like SSAS, they solve query bottlenecks with raw power. As far as I am aware, PALO supports CUDA, or the NVIDIA implementation (ATI have their own) of the GPGPU vision. If this all sounds a bit foreign, have a look at Tom’s Hardware article “The Advent of GPGPU“, where the concept of using the GPU for computational purposes is explained in a fair bit of detail. In short, by harnessing the power of NVIDIA GPUs, the processing power of a PC jumps from a few GFLOPs (50-60 GFLOPs on my i7 2600K OC-ed to 4.5Ghz) to 1500-1800 GFLOPs on my NVIDIA GTX 570 GPU. This means that for GPU optimised calculations, a PC gets a boost of a factor of 30. Both NVIDIA and ATI can see the potential and have been working hard in the last few years to get better drivers and better support for such applications. PALO in particular prefers the NVIDIA Tesla GPU. Note that a Tesla does not even have video output – it is used only for calculations, supports ECC memory (thus making itself ready for enterprise environments), and has been designed from the ground up for CUDA.
In terms of PALO, I got told that when they have an optimised query performing badly, adding a new Tesla unit in the server solves the problem. Their experience shows that the servers scale up linearly with every new GPU, and since NVIDIA’s SLI allows multiple GPUs running in parallel, adding 2-4 such units is all it takes to create a very, very fast computational workhorse.
Another area where I was impressed was the way PALO does mobile. They have free apps for the iPad (which I saw in action), as well as the iPhone and Android. Their vision is that information dashboards are best seen, and mostly required on the go when BI users have limited ability to browse around and get a deeper insight. I tend to agree to some extent. In my experience, the information dashboard is a slightly overrated concept. Having it on your phone or tablet where you can easily connect to you corporate environment and check some numbers quickly is a nice idea and I hope we see it becoming a part of the Microsoft stack sooner rather than later. The application which PALO have is quite nice minus the pies, allows any form of touch experience (multi-touch included) and allows easy slicing and dicing of data – just how it should be.
Open Source Software Compatibility
The last bit I would offer as an impressive and different to other not-open source vendors is the openness and compatibility of PALO with other open-source tools. Their stack components are easily replaceable. The ETL component can be changed to Pentaho’s Kettle, or JasperSoft’s ETL software which can load data directly in PALO’s cubes. A bit like loading a SQL Server data mart with Informatica, but seemingly better and tighter as the interfaces between the components are, apparently, completely open.
Apart from these areas, I think that the Microsoft stack has a nicer UI, allows easier development, and is richer (with MDS, QDS coming up, Data Mining, etc.). PALO has its own ETL tool, which is not graphical and relies on drop-downs and various windows to get the work done, the OLAP server seems to support many features out of the box, allows querying through MDX and supports write-back, but in general seems quite barren from SSAS point of view. The front-end is either Excel through a plug-in allowing the creation of reports through formulas, Open Office, Libre Office, and PALO’s own web-based spreadsheet environment. Once a report is created in either of those it can be published to a web portal for sharing with other users.
All in all, PALO is a neat, free BI suite, which comes for very cheap initially. There is an enterprise version, which is not free and, of course, any new customers will have to pay for someone to install it, configure it, and implement their requirements which will add to the total cost but these expenses are there for any other set of tools (although, a decent argument can be lead on which suite allows faster and cheaper development). The features listed in this article definitely appeal to some and I am very impressed by the innovative GPGPU capability, which has a lot of potential and I can easily think of a few areas where a 30-fold improvement in computational power will benefit SQL Server BI.
3 thoughts on “A Closer Look at PALO and GPGPU”
Nice post Boyan – good to see you’re still on top of the latest and greatest in the world of BI. Just thought I’d add a couple of notes regarding Palo.
1. Palo is a “neat free BI suite” in exactly the same way that MS SQL Server is a neat, free RDBMS. That is, you can download and freely use the “Express Edition” (or Community edition in the case of Palo) but much of the enterprise-class features (like the GPU acceleration, for instance) are only found in the commercial editions.
2. The use of GPU acceleration and the NVIDIA CUDA framework is certainly unique (in the BI world, at least) to Palo. Further to the advantages you’ve outlined is that fact that the current generation of NVIDIA Tesla cards are equipped DDR5 RAM, which is like adding some extra Tabasco sauce to your in-memory processing. Add to that the ability for a single Plao cube to address memory addresses that are spread across different physical GPUs, and you have a platform for some serious number crunching. Just look at the applications of the CUDA technology in the academic world – oceanography, meteorology, fluid dynamics, protein folding, neural network mapping and the like.
3. You’re spot on with regards to the open-standards of the Palo suite. This is one big difference that Palo has over certain other, … let’s call them “legacy” BI platforms (not talking about MS here). Both the ETL and Web components of Palo can be mashed together with multitudes of other technologies and components, or even hot-swapped for third-party components.
Thanks for your comments. “The latest and greatest” in linux speak would be “bleeding edge” as I think is the case with PALO.
On your first point – yes, I agree, the community version is free, while the enterprise is not. Point taken and post adjusted.
On the second one – if your cubes fit in the 6Gbx4=24Gb for, say, 4 Tesla cards and you can use all of it to the last bit for them, then you can fully take advantage of the DDR5, the wide bus and bandwidth. Of course, the much cheaper DDR3 will have to be used for everything else and, correct me if I am wrong, but moving data between the main memory and the GPU memory is not all that fast. Therefore, for specific optimisations and for small data sets (as found in the areas you have listed), GPGPU is great. For large datasets not that much. Also, on the not so great part of things, number crunching in BI is a relatively rare need isolated to some tasks. In the SSAS (and SQL Server Relational) world of relatively large data sets IO is the problem, not the CPU. Therefore, for oceanography and BOINC-type computations the future may be looking very bright when considering GPGPU, but in the world of BI I doubt that it will improve all operations – just a subset. Still, in the bits which are CPU intensive nowadays I believe that GPGPU can offer a viable alternative.
Lastly, yes, open source has the edge when it comes to compatibility, but don’t forget that in the world of let’s call them “leading” BI platforms you can easily combine tools and have DataStage+Oracle+SSAS+QlikView if you want. Nothing is preventing you if you really want to do it. Of course, just like with open-source tools, sticking to the same vendor has its merits.
Nice article. The concept of GPGPU was completely alien to me – and is obviously a very good idea – and seems to be a very good fit for BI indeed. How much value you derive from it I guess depends on how CPU-intensive as opposed to I/O-intensive your request load is.
Comments are closed.