Created 11-17-2015 04:08 PM
Is there a recommended rules engine for Hadoop? Has anyone tested Drool with Hive?
The idea is to have a repository of rules, and the engine should read the rules and apply them to the data. These rules could be simple (e.g. value >10) or complex (e.g. average age is 40% greater than the standard deviation of the population). I know that I can write these rules manually in HIVE, but we would like to let the end users make changes on the rules without having us re-write the rule.
Created 11-18-2015 12:26 AM
In my previous company we developed a rules engine/CEP based on Hadoop.
I don't remember the reasons why but we discarded Drool (the other existing software in the market did not match our need neither). Hive was definitely not an option because it had too much latencies (take care about those last 2 sentences: those design decisions were made 3 years ago, lot of things have changed since and you might reconsider those choices).
The 1st implementation of the CEP was done using MapReduce and HBase (to maintain the states). The rules where loaded from a MySQL database and applied by the MapReduce job.
Since we still had some latencies (due to MR), we started to move the code to Spark (streaming), still keeping HBase as a backend. Using HBase coprocessors was also an idea. Can't say much because I left the company before seeing that change in production.
The front-end was web-graphical-drag&drop, so it allowed the user to quickly implement the business logic without our help.
I'm not sure my answer is exactly what you were looking for. If you find some good opensource CEP projects that suit you, please let me know. I still feel curious about it.
Created 11-18-2015 12:26 AM
In my previous company we developed a rules engine/CEP based on Hadoop.
I don't remember the reasons why but we discarded Drool (the other existing software in the market did not match our need neither). Hive was definitely not an option because it had too much latencies (take care about those last 2 sentences: those design decisions were made 3 years ago, lot of things have changed since and you might reconsider those choices).
The 1st implementation of the CEP was done using MapReduce and HBase (to maintain the states). The rules where loaded from a MySQL database and applied by the MapReduce job.
Since we still had some latencies (due to MR), we started to move the code to Spark (streaming), still keeping HBase as a backend. Using HBase coprocessors was also an idea. Can't say much because I left the company before seeing that change in production.
The front-end was web-graphical-drag&drop, so it allowed the user to quickly implement the business logic without our help.
I'm not sure my answer is exactly what you were looking for. If you find some good opensource CEP projects that suit you, please let me know. I still feel curious about it.
Created 11-18-2015 04:23 AM
Thanks @Sourygna Luangsay . Will let you know if I find something.
I am thinking of starting a simple project for this (after failing to find a good solution) with the following components -
- For analytical side, implement rules using a generic rules UDF that will read the rule conditions and outcome operations from a database and use UDF Distributed Cache to store them. - Possibly use an Ambari Views to set rules - The goal will be to make the design extensible to be able to add more rule operators easily.
This is very high level but expect to start something soon.
Created 01-30-2017 05:40 PM
did this project get a start ? interested in following it and contribute if i can ...