There're already several articles describing how to setup NiFi to connect Kafka or HDFS. However, following scattered piece of documentations to complete Kerberizing HDP, and setting ACL of Zookeeper, Kafka, HDFS, and Kerberos was not an easy task for me.
So, the motivation of this article is to share rough but throughout operations needed to set it up right. You can elaborate each step by digging related documentation further. Example configuration and NiFi template are available, too.
Some of these steps may not be required, and some were forgotten to be here, but these were the steps to Kerberize my HDP sandbox VM:
Install by downloading the latest HDF
Start Kafka, stop maintenance mode.
Restart all affected services
Install a new MIT KDC [1] KDC
Enable Kerberos from Ambari
Proceed with Ambari Kerberos wizard
Check Pig failed to pass the test, but continued with Complete button anyway
Start services manually that didn't start automatically
Configured Kafka for Kerberos over Ambari [2]
Modify Kafka listeners from PLAINTEXT://localhost:6667 to PLAINTEXTSASL://localhost:6667 from Ambari
Enable `Kafka ranger plugin` in Ranger config, and Check `Enable Ranger for KAFKA` in Kafka config from Ambari [6]
Setup Ranger Kafka service [3] Don't know what password should be here. kafka/kafka passed connection test.
If a consumer has already connected to the same topic using same consumer group id, then other consumer using different sasl user can't connect using the same group id. Because a Znode is already created with ACL.
Setup NiFi to access Kerberized Kafka [4], watch out for the '"' when you copy and paste exampl
Setup NiFi to access Kerberized HDFS by setting `/etc/krb5.conf` as `nifi.kerberos.krb5.file` in nifi.properties.
Using only Ranger to manager access control is recommended [5]
- Setup NiFi Dataflow using PutKafka, GetKafka and PutHDFS