Support Questions

Eran · ‎10-31-2015

vshukla · ‎11-03-2015

Spark reads from HDFS and sends jobs to YARN. So security for both HDFS, YARN managed by Ranger works with Spark. From security point of view this is very similar to MR jobs being run on YARN.

Since Spark reads from HDFS using HDFS client, the HDFS TDE feature is transparent to Spark and with right key permissions for user running Spark job, there is nothing in Spark to configure.

Knox isn’t yet relevant to Spark. In future when we have a REST API for Spark, we will integrate Knox with it.

View solution in original post

nsabharwal · ‎10-31-2015

@eorgad@hortonworks.com

Spark and Kerberos

Eran · ‎11-03-2015

So the real problem in our documentation - Best practices, has a conflicting message "use HiveContext (instead of SQLContext) whenever possible" but In YARN cluster mode with Kerberos, use SQLContext !

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_spark-guide/content/ch_practices-spark.ht...

nsabharwal · ‎11-03-2015

@eorgad@hortonworks.com because "In YARN cluster mode with Kerberos, use SQLContext. HiveContext is not supported on YARN in a Kerberos-enabled cluster."

ldaluz · ‎11-03-2015

@Neeraj The docs are conflicting. It is recommending "use HiveContext" in secure mode (with kerberos): http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_spark-guide/content/ch_installing-kerb-sp... and under best practices it states that it is not supported with Kerberos: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_spark-guide/content/ch_practices-spark.ht...

vshukla · ‎11-03-2015

This is a doc bug, filed BUG-47289

Use HiveContext and it should work in Kerberized Cluster with HDP 2.3.2

vshukla · ‎11-03-2015

Spark reads from HDFS and sends jobs to YARN. So security for both HDFS, YARN managed by Ranger works with Spark. From security point of view this is very similar to MR jobs being run on YARN.

Since Spark reads from HDFS using HDFS client, the HDFS TDE feature is transparent to Spark and with right key permissions for user running Spark job, there is nothing in Spark to configure.

Knox isn’t yet relevant to Spark. In future when we have a REST API for Spark, we will integrate Knox with it.

Cloudera Community

Support Questions

What are the best practices to secure spark on HDP 2.3

Secure HDP 2.3 with Apache Ranger

Typical HDP Cluster Network Configuration Best Pra...

LinuxContainerExecutor Security Best Practices

Leveraging the upcoming HIVE 1.3 security UDFs tod...

Kafka Best Practices

Hadoop security best practices & recommendations f...

Hardening Apache ZooKeeper Security Part 2: TLS en...

Best Practices for Spark Programming - Part I

Apache Spark Fine Grain Security with LLAP Test Dr...

Spark Configuration and Best Practice Advice