Created on 04-12-2017 01:24 AM
I recently did a PoC with a customer to integrate NiFi with CDH, part of this was creating external tables in Hive on the newly loaded data. In this article I will share the approaches, useful workarounds, how to customise your own NiFi build for backwards compatibility, and provide a pre-built CDH-compatible Hive Bundle for you to download and try.
Well the short answer is that NiFi 1.x's minimum supported version of Hive is 1.2.x, but CDH uses a fork of Hive1.1.x, which introduces two common backwards compatibility challenges:
The obvious other option here is to work with CDH Hive indirectly, and thus we come to the workarounds.
It is very common in PoCs to not have all the software and configuration parameters exactly as you would like them to be, and to have no time to wait for change control to allow installs and firewall modifications. One of the great things about NiFi is the flexibility to quickly work around roadblocks, so here's the list of workarounds investigated:
Note that I have only tested the Hive bundle functionality against CDH5.10.0, not any of the other processors such as HDFS or Kafka nor other versions. Neither I nor Hortonworks offer guarantees that this or other services will work against CDH and you should thoroughly test things before trusting them with important data.
Here is a Hive-Bundle I've built for CDH5.10.0, just copy it into your nifi/lib directory and restart the service, you should be able to connect the PutHiveQL and SelectHiveQL to your Hive2 service. (dropbox link to file)
The following instructions were tested on a Centos7 VM.
ssh <build server FQDN> sudo su - yum update -y yum install -y wget # Install Maven, Java1.8, Git, to meet minimum NiFi build requirements. wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo yum install -y git java-1.8.0-openjdk apache-maven logout git clone https://github.com/Chaffelson/nifi.git cd nifi git checkout nifi-1.1.x-cdhHiveBundle mvn -T C2.0 clean install -Pcloudera -Dhive.version=1.1.0-cdh5.10.0 -Dhive.hadoop.version=2.6.0-cdh5.10.0 -Dhadoop.version=2.6.0-cdh5.10.0 -DskipTests nifi-assembly/target/nifi-1.1.1-SNAPSHOT-bin/nifi-1.1.1-SNAPSHOT/bin/nifi.sh start # browse to http://<build server FQDN>:8080/nifi to test your new hive bundle
Created on 05-01-2017 11:58 PM
Hi @Dan Chaffelson,
I had the backward compatibility issue and I followed your steps and pasted the nifi-hive-nar into my NiFi 1.1.2 instance. Now , SelectHiveQL was able to connect and query the table but it only gives me the headers(column names) and doesn't retrieve the data. My query was select * from table limit 100. Any idea why? The nifi-app.log wasn't updated either
Created on 05-15-2017 09:39 AM
Hi @Raghav Ramakrishann sorry I only just saw this comment as I've been away on Paternity leave. Can you share the version of CDH you're connecting to, and your service parameters? I might be able to troubleshoot a bit.
Created on 05-16-2017 02:44 AM
Hi @Dan Chaffelson, sorry to not update my comment. I was able to troubleshoot it. It was an issue from the CDH side and not with the NAR file. It's working for me now. Thanks for sharing this article. Really helped me out! 🙂
Created on 05-16-2017 09:04 AM
Glad to hear it!
Created on 05-22-2017 05:32 PM
For connecting Nifi with Hive and Cloudera and Kerberos you can use JDBC. Configure a DBCPConnectionPool as follows:
Database Connection URL: <host>:10000;AuthMech=1;KrbRealm=<kerberos realm>;KrbHostFQDN=_HOST;KrbServiceName=hive
Database Driver Class Name: com.cloudera.hive.jdbc41.HS2Driver
Database Driver Location: Location Cloudera JDBC jar files
After that you can use PutSQL, GetSQL en ConvertJSONtoSQL
Created on 08-21-2018 01:20 PM
Can confirm the DBCPConnectionPool approach suggested here by @Rudolf Schimmel works. We did run into issues when using Java 10 (uncaught Exception: java.lang.NoClassDefFoundError: org/apache/thrift/TException even though libthrift was specified). Using Java 8 worked.