Currently organizations are interested to ingest "Data" into Hadoop(HDFS). Post that analytic knowledge comes into picture according to their business. Developers are using different approaches like tools/Frameworks to do the same. There is a question of performance bottlenecks while launching a new source into production.
Expecting a Solid Data ingestion frame work should be part of Hadoop Eco system, which gives enough confidence to the Clients.
It should accept any kind of data from any source,simple.
Just a thought!
To enable Data Ingestion into Hadoop, community has developed couple of different tools depending on source of data. If you are moving large batches from other databases, then Sqoop is the tool to use. For realtime ingestion from virtually any source you can use Apache Nifi. Both are provided by Hortonworks.
If this is not what you are thinking then you should look at gooblin. This is created by Linkedin and licensed under Apache 2.0 license and not provided by Hortonworks or Cloudera.
Another possibility is Hive Streaming Ingest API (https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest) which can be used as low level API or through Storm/Flume both of which have higher level abstractions that use this API.
@mqureshi - I am looking for a Ingestion framework as a service where i can plug and play irrespective of domain. It can have multiple Hadoop Eco System components.
Apache Nifi, Spark, Flink, Sqoop, Flume. Lots of tools, it really depends on what you are doing.
NIFI has plug and play and lots of sources. This is my choice for most use cases.