Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Running NiFi outside a secured Hadoop Cluster

avatar
Rising Star

I have a NiFi host doing ETL processing outside the Hadoop Cluster. The cluster is secured using Knox/Ranger and the only ports open are ssh to the Hadoop Edge Nodes, and Kafka queue. My question is what are the best options to write data into either HBase or Hive? Ideas I have are:

  • Deploy a NiFi inside the cluster do a site to site (requires opening a firewall port)
  • From NiFi write to the Kafka queue, and from inside the cluster write a java process to pull from the queue and output the data to the target (HBase or Hive)
  • Any other sugestions?
1 ACCEPTED SOLUTION

avatar
Master Guru

Option 1 seems fine if you are able to open the firewall port.

In option 2, rather than write a Java process, you could run a NiFi inside the secure cluster using ConsumeKafka to consume the messages and then use the appropriate follow on processors (PutHDFS, PutHiveQL, PutHBaseJson, etc). So you still use Kafka as the gateway into the cluster, but don't have to write any custom code.

View solution in original post

2 REPLIES 2

avatar
Master Guru

Option 1 seems fine if you are able to open the firewall port.

In option 2, rather than write a Java process, you could run a NiFi inside the secure cluster using ConsumeKafka to consume the messages and then use the appropriate follow on processors (PutHDFS, PutHiveQL, PutHBaseJson, etc). So you still use Kafka as the gateway into the cluster, but don't have to write any custom code.

avatar
Master Guru

I like Bryan's suggestion. That's a good model for IoT as well with remote notes messaging in. You could have send messages between outside cluster and an inside secure cluster via MQTT, JMS, Kafka, SiteToSite. Then just one port and one controller set of IPs communicating with each other. An IoT or security gateway.