Created on 03-06-2019 12:30 PM - edited 08-17-2019 04:42 AM
Before the official release of Cloudera Data Flow (ex. Hortonworks Data Flow), you may want to play with Nifi and Hive on CDH.
However, because CDH 5 is using a fork of Hive 1.1, the HiveQL processors and controller services included on the official Apache release will not work, so you need to have your own, as explained in this article: Connecting NiFi to CDH Hive.
This article is awesome, but does not focus on Kerberos/SSL; since I had to do the configuration myself, I thought I would share the knowledge.
Note: You could use a DBCP connection to connect to Cloudera Hive but it will not allow you to use the proper authentication.
To connect to Hive with SSL and Kerberos, you will need the following:
The goal of this step is to add your certificate to the Java cacerts that is used to run Nifi.
In order to import your certificate, run the following command:
keytool –importcert –alias HS2server -keystore [LOCATION_OF_CACERTS] –file [LOCATION_OF_YOUR_CERTIFICATE]
I'm running on MacOS, so my cacerts is under /Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/security/cacerts, so I ran:
keytool –importcert –alias HS2server -keystore /Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/security/cacerts –file /Users/pvidal/Documents/customers/quest/config/tls/rootCA.pem
Note: This step will require a Nifi restart, so I suggest to stop Nifi before following these instructions and then start it afterwards
Go to your Nifi conf folder and modify the nifi.properties file to add the following:
nifi.kerberos.krb5.file=[LOCATION_OF_YOUR_KRB5.CONF]
Go to your Nifi lib folder and add the necessary NARs; I added the following:
-rwxr-xr-x@ 1 pvidal admin 14800 Mar 5 16:39 nifi-hive-services-api-nar-1.9.0.1.0.0.0-49.nar -rwxr-xr-x@ 1 pvidal admin 164674666 Mar 5 16:39 nifi-hive_1_1-nar-1.9.0.1.0.0.0-49.nar
Note: Remember to restart Nifi before this step.
Go to your controller services, and add a new KeytabCredentialsService.
Configure the service as such:
[LOCATION_OF_YOUR_KEYTAB][NAME_OF_YOUR_PRINCIPAL]Enable the service.
Go to your controller services, and add a new Hive_1_1ConnectionPool (from the NAR you imported).
Configure the service as such:
jdbc:hive2://[YOUR_HIVE_HOST]:10000/default;principal=hive/_HOST@[YOUR_DOMAIN,SAME AS PRINICPAL];ssl=true[LOCATION_OF_HIVE_SITE.XML],[LOCATION_OF_CORE_SITE.XML],[LOCATION_OF_HDFS_SITE.XML][YOUR_KEYTABCREDENTIALSSERVICE]Enable the service.
I configured a simple flow that only contains:
The only bit of configuration I had to do was referencing the Hive_1_1ConnectionPool I created earlier, as depicted below:
Note: With the official release of CDF, all of this will be MUCH simpler, with no need for NAR import. If you're not excited about it, I am!
Created on 06-13-2019 09:41 AM
from where to get both the imported files?
