Support Questions

Find answers, ask questions, and share your expertise

Running NiFi outside a secured Hadoop Cluster

avatar
Rising Star

I have a NiFi host doing ETL processing outside the Hadoop Cluster. The cluster is secured using Knox/Ranger and the only ports open are ssh to the Hadoop Edge Nodes, and Kafka queue. My question is what are the best options to write data into either HBase or Hive? Ideas I have are:

  • Deploy a NiFi inside the cluster do a site to site (requires opening a firewall port)
  • From NiFi write to the Kafka queue, and from inside the cluster write a java process to pull from the queue and output the data to the target (HBase or Hive)
  • Any other sugestions?
1 ACCEPTED SOLUTION

avatar
Master Guru

Option 1 seems fine if you are able to open the firewall port.

In option 2, rather than write a Java process, you could run a NiFi inside the secure cluster using ConsumeKafka to consume the messages and then use the appropriate follow on processors (PutHDFS, PutHiveQL, PutHBaseJson, etc). So you still use Kafka as the gateway into the cluster, but don't have to write any custom code.

View solution in original post

2 REPLIES 2

avatar
Master Guru

Option 1 seems fine if you are able to open the firewall port.

In option 2, rather than write a Java process, you could run a NiFi inside the secure cluster using ConsumeKafka to consume the messages and then use the appropriate follow on processors (PutHDFS, PutHiveQL, PutHBaseJson, etc). So you still use Kafka as the gateway into the cluster, but don't have to write any custom code.

avatar
Master Guru

I like Bryan's suggestion. That's a good model for IoT as well with remote notes messaging in. You could have send messages between outside cluster and an inside secure cluster via MQTT, JMS, Kafka, SiteToSite. Then just one port and one controller set of IPs communicating with each other. An IoT or security gateway.