- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hortonworks Data Flow (Apache Nifi)
Created ‎02-12-2016 08:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is the best mechanism to ingest data from relational sources into HDP. To use a combination of ExecuteSQLand putHDFS processors or to use Sqoop and deliver the data to HDP?
Thanks
Created ‎02-12-2016 08:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can definitely use Sqoop and it's part of HDP stack.
You can leverage HDF (NiFi) to ingest data into HDP. I this case you have to get on support for HDF and HDP.
Created ‎02-12-2016 08:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can definitely use Sqoop and it's part of HDP stack.
You can leverage HDF (NiFi) to ingest data into HDP. I this case you have to get on support for HDF and HDP.
Created ‎02-12-2016 10:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Neeraj Sabharwal Thanks. My question, which tool is best placed to handle data loading from RDBMS. I understand both of them support. But I would like to understand which one is more capable and advantageous over the other.
Thanks
Vijay
Created ‎02-12-2016 10:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Greenhorn Techie Sqoop is the most useable tool in the industry as of today for this use case.
Created ‎02-12-2016 12:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Neeraj Sabharwal for validating my understanding 🙂
Created ‎03-27-2016 11:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would use Sqoop for ingesting RDBMS data as Sqoop will parallelize the ETL job while Nifi will simply run it on the thread that the processor is running on. To do the same thing with Nifi, you would have to create multiple instance of the executeSQL processor and go after partitions of the data you are after.
Created ‎03-28-2016 01:22 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nifi 0.6 release adds ability to run simple change capture cases with QueryDatabaseTable by maintaining timestamps. This might start turning the needle towards nifi away from sqoop https://cwiki.apache.org/confluence/display/NIFI/Release+Notes
