Support Questions
Find answers, ask questions, and share your expertise

How to integrate Knox gateway in hadoop configuration

How to integrate Knox gateway in hadoop configuration

New Contributor

I have installed Knox with LDAP settings and Ranger for policy management. I am able to operate over hive by using knox gateway and beeline, HDFS by using knox gateway api. How do I integrate Knox URL's for Hive and HDFS in hadoop configuration?

Hadoop development environment has configuration files as core-site.xml, hdfs-site.xml, hive-site.xml, mapred-site.xml, yarn-site.xml with hive, hdfs, zookeeper etc connection parameters. Knox is running on port 8443 and topology sandbox.xml has following content for services:

<service> <role>NAMENODE</role> <url>hdfs://sandbox.hortonworks.com:8020</url> </service> <service> <role>JOBTRACKER</role> <url>rpc://sandbox.hortonworks.com:8050</url> </service> <service> <role>WEBHDFS</role> <url>http://sandbox.hortonworks.com:50070/webhdfs</url> </service> <service> <role>WEBHCAT</role> <url>http://sandbox.hortonworks.com:50111/templeton</url> </service> <service> <role>OOZIE</role> <url>http://sandbox.hortonworks.com:11000/oozie</url> </service> <service> <role>WEBHBASE</role> <url>http://sandbox.hortonworks.com:60080</url> </service> <service> <role>HIVE</role> <url>http://sandbox.hortonworks.com:10001/cliservice</url> </service> <service> <role>RESOURCEMANAGER</role> <url>http://sandbox.hortonworks.com:8088/ws</url> </service>

What will be the value following properties:

core-site.xml: fs.defaultFS

hdfs-site.xml: dfs.namenode.http-address,

dfs.namenode.http-address, dfs.namenode.https-address, dfs.namenode.rpc-address, hive.metastore.uris, hive.server2.thrift.http.port, dfs.namenode.https-address, dfs.namenode.rpc-address, dfs.namenode.secondary.http-address

hive-site.xml: hive.cluster.delegation.token.store.zookeeper.connectString, hive.metastore.uris,

mapred-site.xml: mapreduce.jobhistory.address,

yarn-site.xml: hadoop.registry.zk.quorum, yarn.resourcemanager.address, yarn.resourcemanager.admin.address, yarn.timeline-service.webapp.address, yarn.timeline-service.webapp.https.address

7 REPLIES 7

Re: How to integrate Knox gateway in hadoop configuration

Contributor

Hello @sharad vishe to use Knox you do not have to make changes to other hadoop config files. Changes to your Knox topology (sandbox.xml) should be enough for the commonly used hadoop services (There are exceptions to this though but I do not believe that is the case here.)

Re: How to integrate Knox gateway in hadoop configuration

@sharad vishe

You do not need to change those properties in general.

Below blog will give you enough details to solve your query:

https://hortonworks.com/hadoop-tutorial/manage-security-policy-hive-hbase-knox-ranger/

You can also look at this blog which covers the basics:

https://hortonworks.com/hadoop-tutorial/securing-hadoop-infrastructure-apache-knox/

Remeber that knox is a perimeter level security. Meaning if you are already logged into the cluster and have access to services, Knox will not be stopping you. That is the place where Kerberos, Ranger, and OS level security comes into play.

Thanks

Re: How to integrate Knox gateway in hadoop configuration

@sharad vishe

You do not need to change those properties in general.

Below blog will give you enough details to solve your query:

https://hortonworks.com/hadoop-tutorial/manage-security-policy-hive-hbase-knox-ranger/

You can also look at this blog which covers the basics:

https://hortonworks.com/hadoop-tutorial/securing-hadoop-infrastructure-apache-knox/

Remeber that knox is a perimeter level security. Meaning if you are already logged into the cluster and have access to services, Knox ALONE will not be stopping you. That is the place where Kerberos, Ranger, and OS level security comes into play.

Thanks

Re: How to integrate Knox gateway in hadoop configuration

New Contributor

Thanks @rbiswas and @Sandeep More

We are using spark-submit with yarn. Referencing to the link https://spark.apache.org/docs/latest/submitting-applications.html#master-urls, below are the things we are doing:

# Run on a YARN cluster
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode cluster \  # can be client for client mode
  --executor-memory 20G \
  --num-executors 50 \
  /path/to/examples.jar \
  1000

HADOOP_CONF_DIR is the location of core-site.xml, hdfs-site.xml, hive-site.xml, mapred-site.xml, yarn-site.xml.

My knox topology with LDAP is working fine and I am able to use Ranger policies.

But how I can use Knox in this scenario, specifically to submit spark job on yarn cluster?

Re: How to integrate Knox gateway in hadoop configuration

New Contributor

@rbiswas Any updates?

Re: How to integrate Knox gateway in hadoop configuration

Explorer

@sharad vishe , I donot think spark-submit is supported with apache knox. Only Spark web UI support is there.

Please check below link for reference.

http://knox.apache.org/books/knox-0-12-0/user-guide.html#Quick+Start

Regards,

Fahim

Re: How to integrate Knox gateway in hadoop configuration

Explorer