Created 10-02-2016 04:51 PM
we currently hand code imports into our relational database from each source system and it is very cumbersome. Examples of the source systems would be data source like salesforce, twitter, another database, file, sharepoint, etc.
In our next line of software we would like to use a technology stack that has a lot of connectors already built to move the data from a source system into our target system of either Hadoop or mysql. Ideally, these connectors can be easily built and even scriptable. we do not want to reinvent the wheel and looking for some good open source tools to quickly import data into our target system from a large variety of sources.
If this was your requirement, which technology stack would you use and why? It seems to be a common theme in a lot of products to build a generic way to consume data into your system with a lot of community support. Why reinvent the wheel over and over again?
Created on 10-03-2016 11:29 PM - edited 08-19-2019 04:37 AM
NiFi is ideal for exactly your needs. NiFi is a 100% open source Apache project. NiFi also is packaged in Hortonworks Data Flow (HDF) platform where it is bundled with Kafka, Storm, Ambari and Ranger. HDF is completely enterprise multitenant and secure.
NiFi is built to pull data from dozens of data sources ranging from relational databases to email to twitter, local files,S3, HTTP and so on. It has prebuilt connectors to these sources and is developed in an easy-to-configure drag-and-drop way. You can easily build your own connectors, and since this is open source new ones are added continuously.
In addition to pulling from a number of sources you can push to diverse target sources as well. HDFS, hive, kafka are possibilities as well as email, Amazon S3 and many more. Note that HDF works as a great compliment to HDP (hadoop) but does not require it.
In between pulling from sources and pushing to targets, NiFi allows you to transform data, route based on contents, merge data and more mediations of data.
You can get an idea of the data sources you can pull from, the mediations you can make on that data, and the targets you can push to by looking at this list of processors (processors are the basic units you connect into a data flow).https://nifi.apache.org/docs.html
Again, one of the great things about NiFi is its easy to use UI/configuration approach (screenshot below answer).
HCC has numerous articles on NiFi. Just do a search.
Check out:
http://hortonworks.com/apache/nifi/ http://hortonworks.com/blog/hortonworks-dataflow-2-0-ga/
https://nifi.apache.org/docs.html
https://www.youtube.com/watch?v=jctMMHTdTQI
You can download and start using it here: http://hortonworks.com/downloads/#dataflow
Created on 10-03-2016 11:29 PM - edited 08-19-2019 04:37 AM
NiFi is ideal for exactly your needs. NiFi is a 100% open source Apache project. NiFi also is packaged in Hortonworks Data Flow (HDF) platform where it is bundled with Kafka, Storm, Ambari and Ranger. HDF is completely enterprise multitenant and secure.
NiFi is built to pull data from dozens of data sources ranging from relational databases to email to twitter, local files,S3, HTTP and so on. It has prebuilt connectors to these sources and is developed in an easy-to-configure drag-and-drop way. You can easily build your own connectors, and since this is open source new ones are added continuously.
In addition to pulling from a number of sources you can push to diverse target sources as well. HDFS, hive, kafka are possibilities as well as email, Amazon S3 and many more. Note that HDF works as a great compliment to HDP (hadoop) but does not require it.
In between pulling from sources and pushing to targets, NiFi allows you to transform data, route based on contents, merge data and more mediations of data.
You can get an idea of the data sources you can pull from, the mediations you can make on that data, and the targets you can push to by looking at this list of processors (processors are the basic units you connect into a data flow).https://nifi.apache.org/docs.html
Again, one of the great things about NiFi is its easy to use UI/configuration approach (screenshot below answer).
HCC has numerous articles on NiFi. Just do a search.
Check out:
http://hortonworks.com/apache/nifi/ http://hortonworks.com/blog/hortonworks-dataflow-2-0-ga/
https://nifi.apache.org/docs.html
https://www.youtube.com/watch?v=jctMMHTdTQI
You can download and start using it here: http://hortonworks.com/downloads/#dataflow
Created 10-04-2016 09:01 PM
This looks really promising Greg - thank you - I will check this out.