- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
What is the most recommended way (best practice) to load large amounts of data into hive table out of RDBMS using NIFI?
- Labels:
-
Apache Hive
-
Apache NiFi
Created ‎12-20-2015 01:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm new to Hadoop & NIFI but I checked available processors that should be suitable to such case. I've used ExecuteSQL processor to get data out of RDBMS but I'm not sure which processor(s) should be used to transfer data into Hive table. Can you provide "best practice" for such scenario?
Artur
Created ‎12-20-2015 01:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can land data in HDFS by using PutHDFS processor and build Hive tables on top of that location.
Created ‎12-20-2015 01:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can land data in HDFS by using PutHDFS processor and build Hive tables on top of that location.
Created ‎12-20-2015 01:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Neeraj, thanks a lot for fast answer!
I have additional question: is there NIFI processor I can use to "build Hive tables on top of that location"? Or this step can only be done outside NIFI?
Created ‎12-20-2015 01:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Outside afaik, You land the data into HDFS and run create external table ....location 'hdfslocation';
Created ‎12-20-2015 01:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot for your answer!
Created ‎12-21-2015 02:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Artur Bukowski - with large volumes of data and in this classic setup of RDBMS->HDFS you might be better off with sqoop, i.e. if your goal is to move those large datasets in parallel.
If you are after data provenance of those datasets, then NiFi will be a better fit.
Created ‎12-21-2015 07:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
