Support Questions

Find answers, ask questions, and share your expertise

Best practice for exporting oracle rdbms to hive

avatar
Super Collaborator

Hi all

I'm using nifi 1.0 and need to export data from an Oracle rdbms and insert the data into an Orc table on hive.

What is the best way to do that (using what processors) using nothing but NiFi?

To be even more exact if I have the source data as csv then it's not a problem, however It seems that I have to use a custom script for exporting the data to csv from the rdbms. I'd rather use NiFi processors but can't find the suitable to do so.

Any suggestions?

Adi

1 ACCEPTED SOLUTION
3 REPLIES 3

avatar

avatar
Master Guru

To add to Scott's answer, you can use QueryDatabaseTable (for a one-time export, if you choose a "maximum value column" like the primary key column) into a ConvertAvroToORC processor (available in the 1.0 GA release), then a PutHDFS processor to get the data into Hadoop. If the table has not been created, ConvertAvroToORC generates partial Hive DDL in an attribute (CREATE TABLE IF NOT EXISTS...), so after PutHDFS you could have a ReplaceText processor to put the DDL statement (along with the file's HDFS location) into the flow file, then send that to a PutHiveQL processor, which would execute the DDL statement, creating the table atop the directory containing your file(s) in HDFS.

That might sound a bit complicated, but it is flexible and powerful. I will post a template to the NiFi wiki after 1.0 is released, showing how such a flow would work.

avatar
Super Collaborator

Thank you both so much for the response! The querying part using the processors you recommended is pretty straightforward - the problem is - what to use after the querying ? How do i transform the data into csv or other format which i can insert into hadoop ? There is no ConvertAvroToCSV processor and ConvertAvroToOrc is not available in the 1.0 Beta release. Where can i get the 1.0 GA release ??