Created 12-08-2015 05:14 AM
What security is available for Spark? How would the below security aspects work with Spark?
For deployments where security is a concern, what mode of Spark should be used?
Created 12-08-2015 05:20 AM
1. Authentication
Spark supports running in a Kerberized Cluster.
Only Spark on YARN supports security (Kerberos support). From command line run kinit before submitting spark jobs.
LDAP Authentication, there is no Authentication in Spark UI OOB, supports filter for hooking in LDAP
2. Authorization
Spark reads data from HDFS & ORC etc and access control at HDFS level still applies. For example HDFS file permissions (& Ranger integration) applicable to Spark jobs
Spark submits job to YARN queue, so YARN queue ACL (& Ranger integration) applicable to Spark jobs
3. Audi
The Spark jobs are run on YARN and read from HDFS, HBase etc. So audit logs for YARN and HDFS access is still applicable and you can use Ranger to view this.
4. Wire Encryption
•Spark has some coverage, not all channels are covered
Created 12-08-2015 05:20 AM
1. Authentication
Spark supports running in a Kerberized Cluster.
Only Spark on YARN supports security (Kerberos support). From command line run kinit before submitting spark jobs.
LDAP Authentication, there is no Authentication in Spark UI OOB, supports filter for hooking in LDAP
2. Authorization
Spark reads data from HDFS & ORC etc and access control at HDFS level still applies. For example HDFS file permissions (& Ranger integration) applicable to Spark jobs
Spark submits job to YARN queue, so YARN queue ACL (& Ranger integration) applicable to Spark jobs
3. Audi
The Spark jobs are run on YARN and read from HDFS, HBase etc. So audit logs for YARN and HDFS access is still applicable and you can use Ranger to view this.
4. Wire Encryption
•Spark has some coverage, not all channels are covered
Created 12-09-2015 02:51 PM
Spark authentication via a shared secret, Its basically handshaking mechanism to validate the same secret code, this can be configured thru spark.authenticate
Created 12-09-2015 07:14 PM
Note that Spark 1.5+ is needed for spark jobs of duration > 72h not to fail when their kerberos tickers expire. And you'll need to supply a keytab which the Spark AM can renew tickets with. For short-lived queries, this problem should not surface