Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Please see the Cloudera blog for information on the Cloudera Response to CVE-2021-4428

What security is available for Spark?

What security is available for Spark? How would the below security aspects work with Spark?

  1. authentication
  2. authorization
  3. audit
  4. encryption

For deployments where security is a concern, what mode of Spark should be used?

1 ACCEPTED SOLUTION

Accepted Solutions

1. Authentication

Spark supports running in a Kerberized Cluster.

Only Spark on YARN supports security (Kerberos support). From command line run kinit before submitting spark jobs.

LDAP Authentication, there is no Authentication in Spark UI OOB, supports filter for hooking in LDAP

2. Authorization

Spark reads data from HDFS & ORC etc and access control at HDFS level still applies. For example HDFS file permissions (& Ranger integration) applicable to Spark jobs

Spark submits job to YARN queue, so YARN queue ACL (& Ranger integration) applicable to Spark jobs

3. Audi

The Spark jobs are run on YARN and read from HDFS, HBase etc. So audit logs for YARN and HDFS access is still applicable and you can use Ranger to view this.

4. Wire Encryption

•Spark has some coverage, not all channels are covered

View solution in original post

3 REPLIES 3

1. Authentication

Spark supports running in a Kerberized Cluster.

Only Spark on YARN supports security (Kerberos support). From command line run kinit before submitting spark jobs.

LDAP Authentication, there is no Authentication in Spark UI OOB, supports filter for hooking in LDAP

2. Authorization

Spark reads data from HDFS & ORC etc and access control at HDFS level still applies. For example HDFS file permissions (& Ranger integration) applicable to Spark jobs

Spark submits job to YARN queue, so YARN queue ACL (& Ranger integration) applicable to Spark jobs

3. Audi

The Spark jobs are run on YARN and read from HDFS, HBase etc. So audit logs for YARN and HDFS access is still applicable and you can use Ranger to view this.

4. Wire Encryption

•Spark has some coverage, not all channels are covered

View solution in original post

Explorer

Spark authentication via a shared secret, Its basically handshaking mechanism to validate the same secret code, this can be configured thru spark.authenticate

Note that Spark 1.5+ is needed for spark jobs of duration > 72h not to fail when their kerberos tickers expire. And you'll need to supply a keytab which the Spark AM can renew tickets with. For short-lived queries, this problem should not surface