Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

NiFi SelectHiveQL

New Contributor

Hello,

 

I am working on a project where I need to run multiple HiveQL statements in a sequence, then store the output for each of those HiveQL statement in a variable and then inserting those output variables in a SQL Database - using NiFi processors.

Also Can I use SelectHiveQL processor for running multiple Hive Queries in a sequence?

Ideally what should be the NiFi Flow for the above work?

Please let me know as early as possible.

 

Thanks!

4 ACCEPTED SOLUTIONS

Contributor

Hi....yes you can use selecthiveql to run multiple hiveql statements. I would suggest putting the sql statements in a file / files and use list file and putfile processors before selecthiveql....Data is stored in flowfile content so you do not need to have a variable or anything. Finally you can use PutDatabaseRecord to write it to sql db. 

 

Ideal flow with no transformations : 
ListFile->FetchFIle->SelectHiveql->putdatabaserecord

 

Hope this helps.

View solution in original post

New Contributor

Thank You Sagar for your inputs!
I am thinking on the following lines regarding the NiFi Flow:

GetFile (To fetch input file from a specific directory) >> PutHDFS (Write Input File data to HDFS) >> PutHiveQL (Through "LOAD PATH INFILE..." load the data into Hive) >> SelectHiveQL (Execute the Semicolon delimited Hive QL's as Input & output in Avro format) >>  ConvertAvroToJson (Direct Mapping from Avro field to JSON) >> SplitJson (Splits a JSON File into multiple separate FlowFiles) >> EvaluateJSONPath (Evaluate JSON Path expressions against content of a FlowFile & to assign the results to FlowFile attributes) >> ConvertJSONToSQL (Converts a JSON Formatted FlowFile into INSERT SQL Statement) >> PutSQL (Execute the SQL INSERT statement to actually insert the data into the Destination SQL Database)

Please can you validate if the above flow will work ?

Also I have a another question as follows:

If I select multiple column values from a Hive Table in a HiveQL (e.g. "Select name, age, grade from student table" ) and retriving say 10 records for this query and further I want to insert these records / values into destination SQL Database table then what should the flow look like?

Can I use the flow I described above to solve this requirement.

Kindly share your thoughts / inputs.

Thanks & Regards in advance !

 

View solution in original post

Contributor

You can use the flow you have in mind. But, i would suggest some optimization for your flow here: 
PutHDFS & PutHiveQL -> I suggest you create an external table on HDFS so you can skip the puthiveql

Rest of your flow is to make SQL statements -> As i suggested before, you can look at PutDatabaseRecord since it has a recordreader(in your case AvroReader) and you can mention what type of sql query needs to be generated. This will help you with all the conversions and make the flow much much faster.

The process to retrieve selection columns you can still use the same flow, just change the query you retrieve from getfile.

 

Hope this helps.

View solution in original post

New Contributor

Hi Sagar,

 

Thanks a lot again for your inputs. From performance optimization perspective your ideas worked perfectly. PutDatabaseRecord worked like a charm ! The solution is working now.

 

Thank You.

View solution in original post

4 REPLIES 4

Contributor

Hi....yes you can use selecthiveql to run multiple hiveql statements. I would suggest putting the sql statements in a file / files and use list file and putfile processors before selecthiveql....Data is stored in flowfile content so you do not need to have a variable or anything. Finally you can use PutDatabaseRecord to write it to sql db. 

 

Ideal flow with no transformations : 
ListFile->FetchFIle->SelectHiveql->putdatabaserecord

 

Hope this helps.

New Contributor

Thank You Sagar for your inputs!
I am thinking on the following lines regarding the NiFi Flow:

GetFile (To fetch input file from a specific directory) >> PutHDFS (Write Input File data to HDFS) >> PutHiveQL (Through "LOAD PATH INFILE..." load the data into Hive) >> SelectHiveQL (Execute the Semicolon delimited Hive QL's as Input & output in Avro format) >>  ConvertAvroToJson (Direct Mapping from Avro field to JSON) >> SplitJson (Splits a JSON File into multiple separate FlowFiles) >> EvaluateJSONPath (Evaluate JSON Path expressions against content of a FlowFile & to assign the results to FlowFile attributes) >> ConvertJSONToSQL (Converts a JSON Formatted FlowFile into INSERT SQL Statement) >> PutSQL (Execute the SQL INSERT statement to actually insert the data into the Destination SQL Database)

Please can you validate if the above flow will work ?

Also I have a another question as follows:

If I select multiple column values from a Hive Table in a HiveQL (e.g. "Select name, age, grade from student table" ) and retriving say 10 records for this query and further I want to insert these records / values into destination SQL Database table then what should the flow look like?

Can I use the flow I described above to solve this requirement.

Kindly share your thoughts / inputs.

Thanks & Regards in advance !

 

Contributor

You can use the flow you have in mind. But, i would suggest some optimization for your flow here: 
PutHDFS & PutHiveQL -> I suggest you create an external table on HDFS so you can skip the puthiveql

Rest of your flow is to make SQL statements -> As i suggested before, you can look at PutDatabaseRecord since it has a recordreader(in your case AvroReader) and you can mention what type of sql query needs to be generated. This will help you with all the conversions and make the flow much much faster.

The process to retrieve selection columns you can still use the same flow, just change the query you retrieve from getfile.

 

Hope this helps.

New Contributor

Hi Sagar,

 

Thanks a lot again for your inputs. From performance optimization perspective your ideas worked perfectly. PutDatabaseRecord worked like a charm ! The solution is working now.

 

Thank You.