Support Questions
Find answers, ask questions, and share your expertise

Do we have a proven "Data Ingestion Frame work" from Hadoop distributors(Hortonworks/Cloudera) ?

Do we have a proven "Data Ingestion Frame work" from Hadoop distributors(Hortonworks/Cloudera) ?

Contributor

Hi All,

Currently organizations are interested to ingest "Data" into Hadoop(HDFS). Post that analytic knowledge comes into picture according to their business. Developers are using different approaches like tools/Frameworks to do the same. There is a question of performance bottlenecks while launching a new source into production.

Expecting a Solid Data ingestion frame work should be part of Hadoop Eco system, which gives enough confidence to the Clients.

It should accept any kind of data from any source,simple.

Just a thought!

Thanks

Sankar

4 REPLIES 4

Re: Do we have a proven "Data Ingestion Frame work" from Hadoop distributors(Hortonworks/Cloudera) ?

Super Guru
@Sankar T

To enable Data Ingestion into Hadoop, community has developed couple of different tools depending on source of data. If you are moving large batches from other databases, then Sqoop is the tool to use. For realtime ingestion from virtually any source you can use Apache Nifi. Both are provided by Hortonworks.

If this is not what you are thinking then you should look at gooblin. This is created by Linkedin and licensed under Apache 2.0 license and not provided by Hortonworks or Cloudera.

https://github.com/linkedin/gobblin

Re: Do we have a proven "Data Ingestion Frame work" from Hadoop distributors(Hortonworks/Cloudera) ?

Expert Contributor

Another possibility is Hive Streaming Ingest API (https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest) which can be used as low level API or through Storm/Flume both of which have higher level abstractions that use this API.

Re: Do we have a proven "Data Ingestion Frame work" from Hadoop distributors(Hortonworks/Cloudera) ?

Contributor

@mqureshi - I am looking for a Ingestion framework as a service where i can plug and play irrespective of domain. It can have multiple Hadoop Eco System components.

Re: Do we have a proven "Data Ingestion Frame work" from Hadoop distributors(Hortonworks/Cloudera) ?

Super Guru

Apache Nifi, Spark, Flink, Sqoop, Flume. Lots of tools, it really depends on what you are doing.

NIFI has plug and play and lots of sources. This is my choice for most use cases.