Hi - i'm trying to evaluate & understand Apache Knox capabilities.
I've use the following tutorial securing-hadoop-infrastructure-apache-knox, to setup Knox, and am able to run the wordcount mapreduce program by accessing the cluster through Knox gateway.
The question i have is - how do i restrict access to the cluster, if the request is not coming in through the Knox gateway.
for eg. I'm still able to access hdfs w/o coming in through the gateway.
eg, using the following query ->
curl -iku guest:guest-password -X GET 'http://sandbox.hortonworks.com:50070/webhdfs/v1/?op=LISTSTATUS'
I'm new to Apache Knox, so appreciate your help in this.
In order to effectively secure your cluster, you will need to firewall it off from the rest of the world. Then Knox will live either on the border of the DMZ (multi-homed to the cluster network and the outside world) or in he DMZ (as the only node with firewall access to the cluster).
Knox by itself does not secure your cluster. It needs to be combined with other tools to provide secure access.