One main components of Cloudera Open Network Insight (ONI) project is "Parallel Ingest Framework" which consists of other sub-components. Can this framework used to be applied in other use cases ? is there any reference material about the "Parallel Ingest Framework" ? thanks
Coming a bit late to this, but I've spent the day getting ONI up and running.
I think it's fair to say the project is a bit early-life at the moment, to the extent that I'm actually surprised Cloudera are backing it at this stage (beyond a Cloudera Labs project).
Essentially the 'parallel ingest framework' as it currently exists is a framework to launch multiple RabbitMQ consumer processes (Python) which each write files to a staging area in HDFS. These then appear to get built into a temporary table (Hive) and inserted into the main table.
Code for the Python worker (which does the bulk of the ingest work) is here:
So I guess my answer is probably 'no', if Flume or Kafka don't suit then you're probably doing something so specific that you need to write your own ingest framework.