Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What are the best practices to secure spark on HDP 2.3

avatar
Rising Star
 
1 ACCEPTED SOLUTION

avatar

Spark reads from HDFS and sends jobs to YARN. So security for both HDFS, YARN managed by Ranger works with Spark. From security point of view this is very similar to MR jobs being run on YARN.

Since Spark reads from HDFS using HDFS client, the HDFS TDE feature is transparent to Spark and with right key permissions for user running Spark job, there is nothing in Spark to configure.

Knox isn’t yet relevant to Spark. In future when we have a REST API for Spark, we will integrate Knox with it.

View solution in original post

6 REPLIES 6

avatar
Master Mentor

avatar
Rising Star

So the real problem in our documentation - Best practices, has a conflicting message "use HiveContext (instead of SQLContext) whenever possible" but In YARN cluster mode with Kerberos, use SQLContext !

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_spark-guide/content/ch_practices-spark.ht...

avatar
Master Mentor

@eorgad@hortonworks.com because "In YARN cluster mode with Kerberos, use SQLContext. HiveContext is not supported on YARN in a Kerberos-enabled cluster."

avatar
Expert Contributor

@Neeraj The docs are conflicting. It is recommending "use HiveContext" in secure mode (with kerberos): http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_spark-guide/content/ch_installing-kerb-sp... and under best practices it states that it is not supported with Kerberos: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_spark-guide/content/ch_practices-spark.ht...

avatar

This is a doc bug, filed BUG-47289

Use HiveContext and it should work in Kerberized Cluster with HDP 2.3.2

avatar

Spark reads from HDFS and sends jobs to YARN. So security for both HDFS, YARN managed by Ranger works with Spark. From security point of view this is very similar to MR jobs being run on YARN.

Since Spark reads from HDFS using HDFS client, the HDFS TDE feature is transparent to Spark and with right key permissions for user running Spark job, there is nothing in Spark to configure.

Knox isn’t yet relevant to Spark. In future when we have a REST API for Spark, we will integrate Knox with it.