Support Questions
Find answers, ask questions, and share your expertise

What are the best practices to secure spark on HDP 2.3

Solved Go to solution

What are the best practices to secure spark on HDP 2.3

Cloudera Employee
 
1 ACCEPTED SOLUTION

Accepted Solutions

Re: What are the best practices to secure spark on HDP 2.3

Spark reads from HDFS and sends jobs to YARN. So security for both HDFS, YARN managed by Ranger works with Spark. From security point of view this is very similar to MR jobs being run on YARN.

Since Spark reads from HDFS using HDFS client, the HDFS TDE feature is transparent to Spark and with right key permissions for user running Spark job, there is nothing in Spark to configure.

Knox isn’t yet relevant to Spark. In future when we have a REST API for Spark, we will integrate Knox with it.

View solution in original post

6 REPLIES 6

Re: What are the best practices to secure spark on HDP 2.3

Re: What are the best practices to secure spark on HDP 2.3

Cloudera Employee

So the real problem in our documentation - Best practices, has a conflicting message "use HiveContext (instead of SQLContext) whenever possible" but In YARN cluster mode with Kerberos, use SQLContext !

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_spark-guide/content/ch_practices-spark.ht...

Re: What are the best practices to secure spark on HDP 2.3

@eorgad@hortonworks.com because "In YARN cluster mode with Kerberos, use SQLContext. HiveContext is not supported on YARN in a Kerberos-enabled cluster."

Re: What are the best practices to secure spark on HDP 2.3

Rising Star

@Neeraj The docs are conflicting. It is recommending "use HiveContext" in secure mode (with kerberos): http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_spark-guide/content/ch_installing-kerb-sp... and under best practices it states that it is not supported with Kerberos: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_spark-guide/content/ch_practices-spark.ht...

Re: What are the best practices to secure spark on HDP 2.3

This is a doc bug, filed BUG-47289

Use HiveContext and it should work in Kerberized Cluster with HDP 2.3.2

Re: What are the best practices to secure spark on HDP 2.3

Spark reads from HDFS and sends jobs to YARN. So security for both HDFS, YARN managed by Ranger works with Spark. From security point of view this is very similar to MR jobs being run on YARN.

Since Spark reads from HDFS using HDFS client, the HDFS TDE feature is transparent to Spark and with right key permissions for user running Spark job, there is nothing in Spark to configure.

Knox isn’t yet relevant to Spark. In future when we have a REST API for Spark, we will integrate Knox with it.

View solution in original post