Created 10-19-2016 05:06 PM
I have a NiFi host doing ETL processing outside the Hadoop Cluster. The cluster is secured using Knox/Ranger and the only ports open are ssh to the Hadoop Edge Nodes, and Kafka queue. My question is what are the best options to write data into either HBase or Hive? Ideas I have are:
Created 10-19-2016 05:53 PM
Option 1 seems fine if you are able to open the firewall port.
In option 2, rather than write a Java process, you could run a NiFi inside the secure cluster using ConsumeKafka to consume the messages and then use the appropriate follow on processors (PutHDFS, PutHiveQL, PutHBaseJson, etc). So you still use Kafka as the gateway into the cluster, but don't have to write any custom code.
Created 10-19-2016 05:53 PM
Option 1 seems fine if you are able to open the firewall port.
In option 2, rather than write a Java process, you could run a NiFi inside the secure cluster using ConsumeKafka to consume the messages and then use the appropriate follow on processors (PutHDFS, PutHiveQL, PutHBaseJson, etc). So you still use Kafka as the gateway into the cluster, but don't have to write any custom code.
Created 10-19-2016 06:03 PM
I like Bryan's suggestion. That's a good model for IoT as well with remote notes messaging in. You could have send messages between outside cluster and an inside secure cluster via MQTT, JMS, Kafka, SiteToSite. Then just one port and one controller set of IPs communicating with each other. An IoT or security gateway.