- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
What are the best practices to secure spark on HDP 2.3
- Labels:
-
Apache Spark
Created ‎10-31-2015 01:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎11-03-2015 08:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Spark reads from HDFS and sends jobs to YARN. So security for both HDFS, YARN managed by Ranger works with Spark. From security point of view this is very similar to MR jobs being run on YARN.
Since Spark reads from HDFS using HDFS client, the HDFS TDE feature is transparent to Spark and with right key permissions for user running Spark job, there is nothing in Spark to configure.
Knox isn’t yet relevant to Spark. In future when we have a REST API for Spark, we will integrate Knox with it.
Created ‎10-31-2015 01:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎11-03-2015 07:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So the real problem in our documentation - Best practices, has a conflicting message "use HiveContext (instead of SQLContext) whenever possible" but In YARN cluster mode with Kerberos, use SQLContext !
Created ‎11-03-2015 08:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@eorgad@hortonworks.com because "In YARN cluster mode with Kerberos, use SQLContext
. HiveContext
is not supported on YARN in a Kerberos-enabled cluster."
Created ‎11-03-2015 08:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Neeraj The docs are conflicting. It is recommending "use HiveContext" in secure mode (with kerberos): http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_spark-guide/content/ch_installing-kerb-sp... and under best practices it states that it is not supported with Kerberos: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_spark-guide/content/ch_practices-spark.ht...
Created ‎11-03-2015 08:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is a doc bug, filed BUG-47289
Use HiveContext and it should work in Kerberized Cluster with HDP 2.3.2
Created ‎11-03-2015 08:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Spark reads from HDFS and sends jobs to YARN. So security for both HDFS, YARN managed by Ranger works with Spark. From security point of view this is very similar to MR jobs being run on YARN.
Since Spark reads from HDFS using HDFS client, the HDFS TDE feature is transparent to Spark and with right key permissions for user running Spark job, there is nothing in Spark to configure.
Knox isn’t yet relevant to Spark. In future when we have a REST API for Spark, we will integrate Knox with it.
