Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Rules Engine in Hadoop

avatar

Is there a recommended rules engine for Hadoop? Has anyone tested Drool with Hive?

The idea is to have a repository of rules, and the engine should read the rules and apply them to the data. These rules could be simple (e.g. value >10) or complex (e.g. average age is 40% greater than the standard deviation of the population). I know that I can write these rules manually in HIVE, but we would like to let the end users make changes on the rules without having us re-write the rule.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

In my previous company we developed a rules engine/CEP based on Hadoop.

I don't remember the reasons why but we discarded Drool (the other existing software in the market did not match our need neither). Hive was definitely not an option because it had too much latencies (take care about those last 2 sentences: those design decisions were made 3 years ago, lot of things have changed since and you might reconsider those choices).

The 1st implementation of the CEP was done using MapReduce and HBase (to maintain the states). The rules where loaded from a MySQL database and applied by the MapReduce job.

Since we still had some latencies (due to MR), we started to move the code to Spark (streaming), still keeping HBase as a backend. Using HBase coprocessors was also an idea. Can't say much because I left the company before seeing that change in production.

The front-end was web-graphical-drag&drop, so it allowed the user to quickly implement the business logic without our help.

I'm not sure my answer is exactly what you were looking for. If you find some good opensource CEP projects that suit you, please let me know. I still feel curious about it.

View solution in original post

3 REPLIES 3

avatar
Super Collaborator

In my previous company we developed a rules engine/CEP based on Hadoop.

I don't remember the reasons why but we discarded Drool (the other existing software in the market did not match our need neither). Hive was definitely not an option because it had too much latencies (take care about those last 2 sentences: those design decisions were made 3 years ago, lot of things have changed since and you might reconsider those choices).

The 1st implementation of the CEP was done using MapReduce and HBase (to maintain the states). The rules where loaded from a MySQL database and applied by the MapReduce job.

Since we still had some latencies (due to MR), we started to move the code to Spark (streaming), still keeping HBase as a backend. Using HBase coprocessors was also an idea. Can't say much because I left the company before seeing that change in production.

The front-end was web-graphical-drag&drop, so it allowed the user to quickly implement the business logic without our help.

I'm not sure my answer is exactly what you were looking for. If you find some good opensource CEP projects that suit you, please let me know. I still feel curious about it.

avatar

Thanks @Sourygna Luangsay . Will let you know if I find something.

I am thinking of starting a simple project for this (after failing to find a good solution) with the following components -

- For analytical side, implement rules using a generic rules UDF that will read the rule conditions and outcome operations from a database and use UDF Distributed Cache to store them. 
- Possibly use an Ambari Views to set rules
- The goal will be to make the design extensible to be able to add more rule operators easily. 

This is very high level but expect to start something soon.

avatar
Rising Star
@bsaini

did this project get a start ? interested in following it and contribute if i can ...