Support Questions

Find answers, ask questions, and share your expertise

Nifi processor to run Beeline Hive Commands

New Contributor

I have some Hive table that has custom SerDe defined for rows. I have the jars available in HDFS and using beeline I can add those jars using "add jar <file>" command before running any Hive commands. Is there any such option for Nifi as well? I want to run the "add jar" command before inserting data to the Hive table. We have PutHiveQL but they only run Hive commands and fails if we execute any Beeline Commands (add jar fails as it can't parse the query).

One option is to run the the Hive commands as ExecuteStreamCommand, but, I prefer not to execute scripts. If nothing then the option is probably just to implement the processors, but, would like to avoid if other options are already available.

4 REPLIES 4

@Manoj

NiFi has the capability to add additional lib directories for additional or custom jars.

Additional library directories can be specified by using the nifi.nar.library.directory. prefix with unique suffixes and separate paths as values.

For example, to provide two additional library locations, a user could also specify additional properties with keys of:

nifi.nar.library.directory.lib1=/nars/lib1
nifi.nar.library.directory.lib2=/nars/lib2

Then you could try the commands via PutHiveQL.

Expert Contributor

@Manoj

can you the content of the flowfile if you add "ADD JAR...." into statement? And also an error that you are getting. I'm using different "set conf=value" without any issue, so it could be related to how you use.

New Contributor

@Ed Berezitsky

The content of the flowfile is the Hive commands I want to run. I don't see the parse error in logs but when I turn DEBUG level logging I see it. Not sure if PutHiveQL is failing because of this or not, but, I don't see any other errors.

2018-10-02 16:09:25,486 DEBUG [Timer-Driven Process Thread-7] o.apache.nifi.processors.hive.PutHiveQL PutHiveQL[id=01661000-3ada-1555-8ba5-ceb3e221c88c] Failed to parse query: add jar hdfs://<path to jar>
 due to org.apache.hadoop.hive.ql.parse.ParseException: line 1:0 cannot recognize input near 'add' 'jar' 'hdfs': org.apache.hadoop.hive.ql.parse.ParseException: line 1:0 cannot recognize input near 'add' 'jar' 'hdfs'
org.apache.hadoop.hive.ql.parse.ParseException: line 1:0 cannot recognize input near 'add' 'jar' 'hdfs'
        at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:214)
        at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171)
        at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
        at org.apache.nifi.processors.hive.AbstractHiveQLProcessor.findTableNames(AbstractHiveQLProcessor.java:281)
        at org.apache.nifi.processors.hive.PutHiveQL.lambda$null$3(PutHiveQL.java:244)

Contributor

You can generate a flow file that contains your sql commands and then execute with the PutHiveQL, I've done this by replacing flow file content in an existing flow with sql commands (replace processor, and just type the sql in the processor config).

If you start your flow with the create flowfile with sql commands it will just re-execute continuously. you can also read an external file containing your sql using get file, and then execute that with PutHiveQL