<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to pull data from API and store it in HDFS in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-pull-data-from-API-and-store-it-in-HDFS/m-p/153939#M40745</link>
    <description>&lt;P&gt;From a relational database to HDFS or Hive, Sqoop is your best tool &lt;A href="http://hortonworks.com/apache/sqoop/" target="_blank"&gt;http://hortonworks.com/apache/sqoop/&lt;/A&gt;   You can schedule it through Oozie &lt;A href="http://hortonworks.com/apache/oozie/" target="_blank"&gt;http://hortonworks.com/apache/oozie/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;For diverse sources like logs, emails, rss, etc NiFi is your best bet. &lt;A href="http://hortonworks.com/apache/nifi/" target="_blank"&gt;http://hortonworks.com/apache/nifi/&lt;/A&gt;   This includes Restful API capabilities via easy-to-configure HTTP processors. It has its own scheduler. HCC has many articles on NiFi.&lt;/P&gt;&lt;P&gt;You could also do a Restful wget from a linux server and push this to hdfs.&lt;/P&gt;&lt;P&gt;You could also use Zeppelin to pull via wget as above and also to pull streaming via Spark. Zeppelin lets you visualize as well.  It has its own scheduler.&lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://zeppelin.apache.org/docs/0.5.5-incubating/tutorial/tutorial.html" target="_blank"&gt;https://zeppelin.apache.org/docs/0.5.5-incubating/tutorial/tutorial.html&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;&lt;A href="http://hortonworks.com/apache/zeppelin/" target="_blank"&gt;http://hortonworks.com/apache/zeppelin/&lt;/A&gt;
&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Sqoop, Oozie and Zeppelin come out of the box with the HDP platform&lt;/P&gt;&lt;P&gt;NiFi is part of the HDF platform and easily integrates with the HDFS&lt;/P&gt;&lt;P&gt;It is not difficult to set up a linux box to communicate with HDFS&lt;/P&gt;</description>
    <pubDate>Thu, 15 Sep 2016 22:54:48 GMT</pubDate>
    <dc:creator>gkeys</dc:creator>
    <dc:date>2016-09-15T22:54:48Z</dc:date>
    <item>
      <title>How to pull data from API and store it in HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-pull-data-from-API-and-store-it-in-HDFS/m-p/153938#M40744</link>
      <description>&lt;P&gt;I am aware of flume and Kafka but these are event driven tools. I don't need it to be event driven or real time but may be just schedule the import once in a day. &lt;/P&gt;&lt;P&gt;What are the data ingestion tools available for importing data from API's in HDFS?&lt;/P&gt;&lt;P&gt;I am not using HBase either but Hive.
I have used `R` language for that for quite a time but I am looking for a more robust,may be native solution to Hadoop environment.&lt;/P&gt;</description>
      <pubDate>Thu, 15 Sep 2016 19:29:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-pull-data-from-API-and-store-it-in-HDFS/m-p/153938#M40744</guid>
      <dc:creator>simran_k</dc:creator>
      <dc:date>2016-09-15T19:29:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to pull data from API and store it in HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-pull-data-from-API-and-store-it-in-HDFS/m-p/153939#M40745</link>
      <description>&lt;P&gt;From a relational database to HDFS or Hive, Sqoop is your best tool &lt;A href="http://hortonworks.com/apache/sqoop/" target="_blank"&gt;http://hortonworks.com/apache/sqoop/&lt;/A&gt;   You can schedule it through Oozie &lt;A href="http://hortonworks.com/apache/oozie/" target="_blank"&gt;http://hortonworks.com/apache/oozie/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;For diverse sources like logs, emails, rss, etc NiFi is your best bet. &lt;A href="http://hortonworks.com/apache/nifi/" target="_blank"&gt;http://hortonworks.com/apache/nifi/&lt;/A&gt;   This includes Restful API capabilities via easy-to-configure HTTP processors. It has its own scheduler. HCC has many articles on NiFi.&lt;/P&gt;&lt;P&gt;You could also do a Restful wget from a linux server and push this to hdfs.&lt;/P&gt;&lt;P&gt;You could also use Zeppelin to pull via wget as above and also to pull streaming via Spark. Zeppelin lets you visualize as well.  It has its own scheduler.&lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://zeppelin.apache.org/docs/0.5.5-incubating/tutorial/tutorial.html" target="_blank"&gt;https://zeppelin.apache.org/docs/0.5.5-incubating/tutorial/tutorial.html&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;&lt;A href="http://hortonworks.com/apache/zeppelin/" target="_blank"&gt;http://hortonworks.com/apache/zeppelin/&lt;/A&gt;
&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Sqoop, Oozie and Zeppelin come out of the box with the HDP platform&lt;/P&gt;&lt;P&gt;NiFi is part of the HDF platform and easily integrates with the HDFS&lt;/P&gt;&lt;P&gt;It is not difficult to set up a linux box to communicate with HDFS&lt;/P&gt;</description>
      <pubDate>Thu, 15 Sep 2016 22:54:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-pull-data-from-API-and-store-it-in-HDFS/m-p/153939#M40745</guid>
      <dc:creator>gkeys</dc:creator>
      <dc:date>2016-09-15T22:54:48Z</dc:date>
    </item>
    <item>
      <title>Re: How to pull data from API and store it in HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-pull-data-from-API-and-store-it-in-HDFS/m-p/153940#M40746</link>
      <description>&lt;P&gt;NIFI/HDF is the way, very easy and a huge number of sources.&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/articles/52415/processing-social-media-feeds-in-stream-with-apach.html" target="_blank"&gt;https://community.hortonworks.com/articles/52415/processing-social-media-feeds-in-stream-with-apach.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;ttps://community.hortonworks.com/content/kbentry/47854/accessing-facebook-page-data-from-apache-nifi.html 
h&lt;/P&gt;&lt;P&gt;ttps://community.hortonworks.com/articles/46258/iot-example-in-apache-nifi-consuming-and-producing.html  
h&lt;/P&gt;&lt;P&gt;ttps://community.hortonworks.com/articles/45531/using-apache-nifi-070s-new-putslack-processor.html  
h&lt;/P&gt;&lt;P&gt;ttps://community.hortonworks.com/articles/45706/using-the-new-hiveql-processors-in-apache-nifi-070.html  
h&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/content/kbentry/44018/create-kafka-topic-and-use-from-apache-nifi-for-hd.html" target="_blank"&gt;https://community.hortonworks.com/content/kbentry/44018/create-kafka-topic-and-use-from-apache-nifi-for-hd.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/content/kbentry/55839/reading-sensor-data-from-remote-sensors-on-raspber.html" target="_blank"&gt;https://community.hortonworks.com/content/kbentry/55839/reading-sensor-data-from-remote-sensors-on-raspber.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2016 01:00:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-pull-data-from-API-and-store-it-in-HDFS/m-p/153940#M40746</guid>
      <dc:creator>TimothySpann</dc:creator>
      <dc:date>2016-09-16T01:00:02Z</dc:date>
    </item>
  </channel>
</rss>

