- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Best practice for exporting oracle rdbms to hive
- Labels:
-
Apache NiFi
Created ‎08-25-2016 05:54 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all
I'm using nifi 1.0 and need to export data from an Oracle rdbms and insert the data into an Orc table on hive.
What is the best way to do that (using what processors) using nothing but NiFi?
To be even more exact if I have the source data as csv then it's not a problem, however It seems that I have to use a custom script for exporting the data to csv from the rdbms. I'd rather use NiFi processors but can't find the suitable to do so.
Any suggestions?
Adi
Created ‎08-25-2016 06:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Adi Jabkowsky. Take a look at the QueryDatabaseTable processor. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.QueryDatabaseT.... You can also use the ExecuteSQL processor https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ExecuteSQL/ind...
Created ‎08-25-2016 06:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Adi Jabkowsky. Take a look at the QueryDatabaseTable processor. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.QueryDatabaseT.... You can also use the ExecuteSQL processor https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ExecuteSQL/ind...
Created ‎08-25-2016 06:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To add to Scott's answer, you can use QueryDatabaseTable (for a one-time export, if you choose a "maximum value column" like the primary key column) into a ConvertAvroToORC processor (available in the 1.0 GA release), then a PutHDFS processor to get the data into Hadoop. If the table has not been created, ConvertAvroToORC generates partial Hive DDL in an attribute (CREATE TABLE IF NOT EXISTS...), so after PutHDFS you could have a ReplaceText processor to put the DDL statement (along with the file's HDFS location) into the flow file, then send that to a PutHiveQL processor, which would execute the DDL statement, creating the table atop the directory containing your file(s) in HDFS.
That might sound a bit complicated, but it is flexible and powerful. I will post a template to the NiFi wiki after 1.0 is released, showing how such a flow would work.
Created ‎08-28-2016 06:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you both so much for the response! The querying part using the processors you recommended is pretty straightforward - the problem is - what to use after the querying ? How do i transform the data into csv or other format which i can insert into hadoop ? There is no ConvertAvroToCSV processor and ConvertAvroToOrc is not available in the 1.0 Beta release. Where can i get the 1.0 GA release ??
