LexisNexis has open-sourced its alternative to Hadoop, called High Performance Computing Cluster. The code is available on GitHub. For years the code was restricted to LexisNexis Risk Solutions. The system contains two major components:
- Thor (Thor Data Refinery Cluster) is the data processing framework. It “crunches, analyzes and indexes huge amounts of data a la Hadoop.”
- Roxie (Roxy Radid Data Delivery Cluster) is more like a data warehouse and is designed with quick querying in mind for frontends.
The protocol that drives the whole process is the Enterprise Control Language which is said to be faster and more efficient than Hadoop’s version of MapReduce. A picture is a much better way to show how the system works. Below is a diagram from the Gigaom article from which most of this information originates.
To me, Roxie seems much more exciting because it seems to complement (or replace) several technologies currently in the space. I do not know all the details, but it seems to potentially encapsulate technologies such as HBase, Hive, RabbitMQ and MemcacheDB, technologies that are common used to query and speed data to a web frontend.
My opinion on HPCC is mixed. Although Hadoop has already taken off in usage, LexisNexis is a very strong institution and could potentially convince some corporate users to use their system instead — those that do not want to use Microsoft’s Dryad project. I do not see HPCC being a Hadoop killer, just as I do not see Spark or any other alternative to be a Hadoop killer. However, if HPCC does become a strong alternative, I sense this could be trouble for some of the newer players in the Hadoop field such as HortonWorks and MapR. I do not have much of an interest in studying business and competition, but Hadoop Summit 2011 showed that the Hadoop space has become crowded, and small breakthroughs such as another company developing a similar project is enough to add volatility and uncertainty for all involved.