Created 08-25-2016 05:54 PM
Hi all
I'm using nifi 1.0 and need to export data from an Oracle rdbms and insert the data into an Orc table on hive.
What is the best way to do that (using what processors) using nothing but NiFi?
To be even more exact if I have the source data as csv then it's not a problem, however It seems that I have to use a custom script for exporting the data to csv from the rdbms. I'd rather use NiFi processors but can't find the suitable to do so.
Any suggestions?
Adi
Created 08-25-2016 06:04 PM
Hi @Adi Jabkowsky. Take a look at the QueryDatabaseTable processor. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.QueryDatabaseT.... You can also use the ExecuteSQL processor https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ExecuteSQL/ind...
Created 08-25-2016 06:04 PM
Hi @Adi Jabkowsky. Take a look at the QueryDatabaseTable processor. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.QueryDatabaseT.... You can also use the ExecuteSQL processor https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ExecuteSQL/ind...
Created 08-25-2016 06:44 PM
To add to Scott's answer, you can use QueryDatabaseTable (for a one-time export, if you choose a "maximum value column" like the primary key column) into a ConvertAvroToORC processor (available in the 1.0 GA release), then a PutHDFS processor to get the data into Hadoop. If the table has not been created, ConvertAvroToORC generates partial Hive DDL in an attribute (CREATE TABLE IF NOT EXISTS...), so after PutHDFS you could have a ReplaceText processor to put the DDL statement (along with the file's HDFS location) into the flow file, then send that to a PutHiveQL processor, which would execute the DDL statement, creating the table atop the directory containing your file(s) in HDFS.
That might sound a bit complicated, but it is flexible and powerful. I will post a template to the NiFi wiki after 1.0 is released, showing how such a flow would work.
Created 08-28-2016 06:23 AM
Thank you both so much for the response! The querying part using the processors you recommended is pretty straightforward - the problem is - what to use after the querying ? How do i transform the data into csv or other format which i can insert into hadoop ? There is no ConvertAvroToCSV processor and ConvertAvroToOrc is not available in the 1.0 Beta release. Where can i get the 1.0 GA release ??