Created 01-14-2017 01:40 AM
Hi - i'm trying to evaluate & understand Apache Knox capabilities.
I've use the following tutorial securing-hadoop-infrastructure-apache-knox, to setup Knox, and am able to run the wordcount mapreduce program by accessing the cluster through Knox gateway.
The question i have is - how do i restrict access to the cluster, if the request is not coming in through the Knox gateway.
for eg. I'm still able to access hdfs w/o coming in through the gateway.
eg, using the following query ->
curl -iku guest:guest-password -X GET 'http://sandbox.hortonworks.com:50070/webhdfs/v1/?op=LISTSTATUS'
I'm new to Apache Knox, so appreciate your help in this.
Created 01-14-2017 01:40 AM
@Neeraj Sabharwal, @Sunile Manjee - any ideas on this ?
Created 01-14-2017 03:44 AM
You have to configure the service properly. More details here
https://knox.apache.org/books/knox-0-9-0/user-guide.html#WebHDFS
Once the service is configured, only access via knox will be allowed
Created 01-14-2017 05:19 AM
In order to effectively secure your cluster, you will need to firewall it off from the rest of the world. Then Knox will live either on the border of the DMZ (multi-homed to the cluster network and the outside world) or in he DMZ (as the only node with firewall access to the cluster).
Knox by itself does not secure your cluster. It needs to be combined with other tools to provide secure access.