I need to send data to Kafka from an application located outside of my Hadoop cluster.
The current implementation is using the Kafka Client.
Is there a way to connect through the edge node (as Kafka is not directly reachable from outside) and forward the message to the cluster inside? Or is the use of Kafka API through HTTP a must?
Furthermore, if I need to use Kafka API, what are its limitations against Kafka Client?
have you tried on connecting the port 6667 from outside you can configure additional listners according to your need and can use those ports to connect kafka from outside
If you expose Kafka via HTTP, then I don't see the downside of exposing Kafka itself.
If you did enable HTTPS on the "Kafka REST api", (via Knox, for example https://knox.apache.org/books/knox-1-1-0/user-guide.html#Kafka) then you should be enabling TLS/SSL on Kafka, in which case, certificates would be needed to make external clients secure.
Kafka should realistically not be treated as a "walled off" service behind the Hadoop network, and you cannot proxy requests though another server without manually setting up that TLS tunnel yourself. Kafka is a common access point for getting data into Hadoop as well, so it should be treated as a first-class "edge ingestion layer" itself. You should take similar care to setup authentication and access rules around every single broker just like you've done for the Hadoop "edge node".
You could alternatively use NiFi to listen on some other random port, then send to a Kafka producer processor, then someone scanning open ports wouldn't be able to detect it's Kafka responding, it would be NiFi, though you still would have the same problem of people can send random messages into that socket if they don't require authentication