Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Parallel Ingest Framework Use Cases

Highlighted

Parallel Ingest Framework Use Cases

New Contributor

One main components of Cloudera Open Network Insight (ONI) project is "Parallel Ingest Framework" which consists of other sub-components. Can this framework used to be applied in other use cases ? is there any reference material about the "Parallel Ingest Framework" ? thanks

 

1 REPLY 1

Re: Parallel Ingest Framework Use Cases

Explorer

Coming a bit late to this, but I've spent the day getting ONI up and running.

 

I think it's fair to say the project is a bit early-life at the moment, to the extent that I'm actually surprised Cloudera are backing it at this stage (beyond a Cloudera Labs project).

 

Essentially the 'parallel ingest framework' as it currently exists is a framework to launch multiple RabbitMQ consumer processes (Python) which each write files to a staging area in HDFS.  These then appear to get built into a  temporary table (Hive) and inserted into the main table.

 

Code for the Python worker (which does the bulk of the ingest work) is here:

https://github.com/Open-Network-Insight/oni-ingest/blob/1.0.1/ingest/worker.py

 

So I guess my answer is probably 'no', if Flume or Kafka don't suit then you're probably doing something so specific that you need to write your own ingest framework.