Created 09-15-2016 12:29 PM
I am aware of flume and Kafka but these are event driven tools. I don't need it to be event driven or real time but may be just schedule the import once in a day.
What are the data ingestion tools available for importing data from API's in HDFS?
I am not using HBase either but Hive. I have used `R` language for that for quite a time but I am looking for a more robust,may be native solution to Hadoop environment.
Created 09-15-2016 03:54 PM
From a relational database to HDFS or Hive, Sqoop is your best tool http://hortonworks.com/apache/sqoop/ You can schedule it through Oozie http://hortonworks.com/apache/oozie/
For diverse sources like logs, emails, rss, etc NiFi is your best bet. http://hortonworks.com/apache/nifi/ This includes Restful API capabilities via easy-to-configure HTTP processors. It has its own scheduler. HCC has many articles on NiFi.
You could also do a Restful wget from a linux server and push this to hdfs.
You could also use Zeppelin to pull via wget as above and also to pull streaming via Spark. Zeppelin lets you visualize as well. It has its own scheduler.
Sqoop, Oozie and Zeppelin come out of the box with the HDP platform
NiFi is part of the HDF platform and easily integrates with the HDFS
It is not difficult to set up a linux box to communicate with HDFS
Created 09-15-2016 03:54 PM
From a relational database to HDFS or Hive, Sqoop is your best tool http://hortonworks.com/apache/sqoop/ You can schedule it through Oozie http://hortonworks.com/apache/oozie/
For diverse sources like logs, emails, rss, etc NiFi is your best bet. http://hortonworks.com/apache/nifi/ This includes Restful API capabilities via easy-to-configure HTTP processors. It has its own scheduler. HCC has many articles on NiFi.
You could also do a Restful wget from a linux server and push this to hdfs.
You could also use Zeppelin to pull via wget as above and also to pull streaming via Spark. Zeppelin lets you visualize as well. It has its own scheduler.
Sqoop, Oozie and Zeppelin come out of the box with the HDP platform
NiFi is part of the HDF platform and easily integrates with the HDFS
It is not difficult to set up a linux box to communicate with HDFS
Created 09-15-2016 06:00 PM
NIFI/HDF is the way, very easy and a huge number of sources.
ttps://community.hortonworks.com/content/kbentry/47854/accessing-facebook-page-data-from-apache-nifi.html h
ttps://community.hortonworks.com/articles/46258/iot-example-in-apache-nifi-consuming-and-producing.html h
ttps://community.hortonworks.com/articles/45531/using-apache-nifi-070s-new-putslack-processor.html h
ttps://community.hortonworks.com/articles/45706/using-the-new-hiveql-processors-in-apache-nifi-070.html h