Created on 03-06-201912:30 PM - edited 08-17-201904:42 AM
Introduction
Before the official release of Cloudera Data Flow (ex. Hortonworks Data Flow), you may want to play with Nifi and Hive on CDH.
However, because CDH 5 is using a fork of Hive 1.1, the HiveQL processors and controller services included on the official Apache release will not work, so you need to have your own, as explained in this article: Connecting NiFi to CDH Hive.
This article is awesome, but does not focus on Kerberos/SSL; since I had to do the configuration myself, I thought I would share the knowledge.
Note: You could use a DBCP connection to connect to Cloudera Hive but it will not allow you to use the proper authentication.
Pre-Requisites
To connect to Hive with SSL and Kerberos, you will need the following:
A running version of Nifi (I used Apache 1.9 in this example)
A Kerberized CDH cluster with Hive on SSL (I used CDH 5.15 in this example)
The certificate to add to your keystore for SSL connection
A keytab for a specific user authorized in the cluster
The krb5 configuration file from the cluster
hive-site.xml, core-site.xml and hdfs-site.xml from your cluster
Nifi processors and services compiled for Hive 1.1 on CDH (can be compiled like described in the article I linked to)
Step 1: Add certificate to Java truststore
The goal of this step is to add your certificate to the Java cacerts that is used to run Nifi.
In order to import your certificate, run the following command: