<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: best tools to import data from a myriad of sources in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/best-tools-to-import-data-from-a-myriad-of-sources/m-p/108908#M42497</link>
    <description>&lt;P&gt;This looks really promising Greg - thank you - I will check this out.&lt;/P&gt;</description>
    <pubDate>Wed, 05 Oct 2016 04:01:21 GMT</pubDate>
    <dc:creator>cloppg</dc:creator>
    <dc:date>2016-10-05T04:01:21Z</dc:date>
    <item>
      <title>best tools to import data from a myriad of sources</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/best-tools-to-import-data-from-a-myriad-of-sources/m-p/108906#M42495</link>
      <description>&lt;P&gt;we currently hand code imports into our relational database from each source system and it is very cumbersome. Examples of the source systems would be data source like salesforce, twitter, another database, file, sharepoint, etc.&lt;/P&gt;&lt;P&gt;In our next line of software we would like to use a technology stack that has a lot of connectors already built to move the data from a source system into our target system of either Hadoop or mysql. Ideally, these connectors can be easily built and even scriptable. we do not want to reinvent the wheel and looking for some good open source tools to quickly import data into our target system from a large variety of sources.&lt;/P&gt;&lt;P&gt;If this was your requirement, which technology stack would you use and why? It seems to be a common theme in a lot of products to build a generic way to consume data into your system with a lot of community support. Why reinvent the wheel over and over again?&lt;/P&gt;</description>
      <pubDate>Sun, 02 Oct 2016 23:51:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/best-tools-to-import-data-from-a-myriad-of-sources/m-p/108906#M42495</guid>
      <dc:creator>cloppg</dc:creator>
      <dc:date>2016-10-02T23:51:18Z</dc:date>
    </item>
    <item>
      <title>Re: best tools to import data from a myriad of sources</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/best-tools-to-import-data-from-a-myriad-of-sources/m-p/108907#M42496</link>
      <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/59438/best-tools-to-import-data-from-a-myriad-of-sources.html#" rel="nofollow noopener noreferrer" target="_blank"&gt;@Fred Schwartz&lt;/A&gt;&lt;/P&gt;&lt;P&gt;NiFi is ideal for exactly your needs. NiFi is a 100% open source Apache project. NiFi also is packaged in Hortonworks Data Flow (HDF) platform where it is bundled with Kafka, Storm, Ambari and Ranger.  HDF is completely enterprise multitenant and secure.&lt;/P&gt;&lt;P&gt;NiFi is built to pull data from dozens of data sources ranging from relational databases to email to twitter, local files,S3, HTTP and so on. It has prebuilt connectors to these sources and is developed in an easy-to-configure drag-and-drop way.  You can easily build your own connectors, and since this is open source new ones are added continuously. &lt;/P&gt;&lt;P&gt;In addition to pulling from a number of sources you can push to diverse target sources as well.  HDFS, hive, kafka are possibilities as well as email, Amazon S3 and many more.  Note that HDF works as a great compliment to HDP (hadoop) but does not require it.&lt;/P&gt;&lt;P&gt;In between pulling from sources and pushing to targets, NiFi allows you to transform data, route based on contents, merge data and more mediations of data.&lt;/P&gt;&lt;P&gt;You can get an idea of the data sources you can pull from, the mediations you can make on that data, and the targets you can push to by looking at this list of processors (processors are the basic units you connect into a data flow).&lt;A href="https://nifi.apache.org/docs.html" target="_blank"&gt;https://nifi.apache.org/docs.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Again, one of the great things about NiFi is its easy to use UI/configuration approach (screenshot below answer).&lt;/P&gt;&lt;P&gt;HCC has numerous articles on NiFi.  Just do a search.&lt;/P&gt;&lt;P&gt;Check out:&lt;/P&gt;&lt;P&gt;&lt;A href="http://hortonworks.com/apache/nifi/" target="_blank" rel="nofollow noopener noreferrer"&gt;http://hortonworks.com/apache/nifi/&lt;/A&gt;
&lt;A href="http://hortonworks.com/blog/hortonworks-dataflow-2-0-ga/" target="_blank" rel="nofollow noopener noreferrer"&gt;http://hortonworks.com/blog/hortonworks-dataflow-2-0-ga/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://nifi.apache.org/docs.html" target="_blank" rel="nofollow noopener noreferrer"&gt;https://nifi.apache.org/docs.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.youtube.com/watch?v=jctMMHTdTQI" target="_blank" rel="nofollow noopener noreferrer"&gt;https://www.youtube.com/watch?v=jctMMHTdTQI&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You can download and start using it here: &lt;A href="http://hortonworks.com/downloads/#dataflow" target="_blank" rel="nofollow noopener noreferrer"&gt;http://hortonworks.com/downloads/#dataflow&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="8233-screen-shot-2016-10-03-at-72633-pm.png" style="width: 944px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/23396i17D1DA2FEE873A2C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="8233-screen-shot-2016-10-03-at-72633-pm.png" alt="8233-screen-shot-2016-10-03-at-72633-pm.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Aug 2019 11:37:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/best-tools-to-import-data-from-a-myriad-of-sources/m-p/108907#M42496</guid>
      <dc:creator>gkeys</dc:creator>
      <dc:date>2019-08-19T11:37:20Z</dc:date>
    </item>
    <item>
      <title>Re: best tools to import data from a myriad of sources</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/best-tools-to-import-data-from-a-myriad-of-sources/m-p/108908#M42497</link>
      <description>&lt;P&gt;This looks really promising Greg - thank you - I will check this out.&lt;/P&gt;</description>
      <pubDate>Wed, 05 Oct 2016 04:01:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/best-tools-to-import-data-from-a-myriad-of-sources/m-p/108908#M42497</guid>
      <dc:creator>cloppg</dc:creator>
      <dc:date>2016-10-05T04:01:21Z</dc:date>
    </item>
  </channel>
</rss>

