Community Articles

kbadani · ‎06-30-2017

1. Goal

This article is in continuation of this HCC article https://community.hortonworks.com/content/kbentry/101181/rowcolumn-level-security-in-sql-for-apache-....

One can take advantage of Row/Column level security of Spark via various Zeppelin interpreters as explained in the following table:

Interpreter name	Row/Column security feature supported?	Reason for no support
% jdbc (with spark1.x STS)	Yes
% jdbc (with spark2 STS)	Yes
% livy.sql	No	Zeppelin’s livy interpreter won’t support Row/Column level security because it uses yarn-cluster mode and it needs delegation tokens to access HiveServer2 in yarn-cluster mode. This support is not present in Spark1.x
% livy2.sql	Yes
% spark.sql	No	Zeppelin’s Spark interpreter group does not support user impersonation
% spark2.sql	No	Zeppelin’s Spark interpreter group does not support user impersonation

In this article, we will show how to configure Zeppelin’s livy2 and jdbc interpreters to take advantage of Row/Column level security feature provided by Spark in HDP 2.6.1.

2. Environment:

HDP-2.6.1 Kerberized cluster with Spark, Spark2, Ranger, Zeppelin and Hive installed.
Non wire-encrypted cluster. (There is an issue with Zeppelin’s livy interpreter in wire encrypted environment, https://issues.apache.org/jira/browse/ZEPPELIN-2584 and hence for the purpose of this article, we have used a non wire-encrypted cluster)
Zeppelin’s authentication enabled via shiro.ini ( refer this document : https://zeppelin.apache.org/docs/0.7.0/security/shiroauthentication.html for more information)

3. Setup:

3.1 Configure zeppelin's livy2 interpreter

Download spark-llap_2.11-1.1.2-2.1.jar for Spark2 LLAP in case of HDP-2.6.1 (or spark-llap_2.11-1.1.1-2.1.jar for Spark2 LLAP in case of HDP-2.6.0.3 ) . Store this jar into HDFS. For the purpose of this article, we will refer to this jar as spark2-llap jar.

For Zeppelin’s livy2 interpreter to support Row/Column level security feature of Spark2-LLAP, we need to configure livy2 interpreter. There is no need of configuring spark2-default as mentioned in section 5.4 of HCC article . In order to do this, go to Zeppelin’s interpreter UI page and edit livy2 interpreter to add following properties

livy.spark.sql.hive.llap = true

livy.spark.hadoop.hive.llap.daemon.service.hosts = <value of hive.llap.daemon.service.hosts>

livy.spark.jars = <HDFS path of spark2-llap jar>

livy.spark.sql.hive.hiveserver2.jdbc.url = <hiveserver2 jdbc URL>

livy.spark.sql.hive.hiveserver2.jdbc.url.principal = <value of hive.server2.authentication.kerberos.principal>

3.2 Configure zeppelin's jdbc interpreter

We can use Zeppelin’s jdbc interpreter to route sql queries to Spark1.x or Spark2 by configuring it to use Spark1.x thrift server when invoked with %jdbc(spark) and to use Spark2 thrift server when invoked with %jdbc(spark2)

Follow steps mentioned in Section 4.2 ,Section 4.3, Section 5.1, Section 5.2 and Section 5.3 sequentially of above HCC article in order to enable Hive Interactive Query and Ranger Hive Plugin Setup HDFS and Hive

Additionally, follow steps mentioned in section 5.5 of above HCC article to setup Spark2 Thrift Server and Spark1.x thrift server with caveats mentioned in the appendix of this article

Now, go on Zeppelin’s interpreter UI page and edit jdbc interpreter to add following properties and then save the new configurations

spark.driver : org.apache.hive.jdbc.HiveDriver

spark.url : <Spark1.x thrift server jdbc url>

spark2.driver : org.apache.hive.jdbc.HiveDriver

spark2.url : <Spark2 thrift server jdbc url>

3.3 Running Example

Follow steps from Section 6 and Section 7 of the above HCC article to setup database, table and ranger policies to run the example. For the purpose of this article I am using ‘hrt_1’ user in place of ‘billing’ user and ‘hrt_2’ user in place of ‘datascience’ user
Login to Zeppelin UI as ‘hrt_1’ user and run the paragraph ‘SELECT * FROM db_spark.t_spark’ as %jdbc(spark), %jdbc(spark2) and %livy2.sql interpreters . You should see unfiltered and unmasked results as per the set ranger policies

Login to Zeppelin UI as ‘hrt_2’ user and run the paragraph ‘SELECT * FROM db_spark.t_spark’ as %jdbc(spark), %jdbc(spark2) and %livy2.sql interpreters. You should see filtered and masked results now.

Appendix

For Spark2 with jdbc interpreter

For HDP 2.6.1 cluster, configure spark_thrift_cmd_opts in spark2-env as

--packages com.hortonworks.spark:spark-llap-assembly_2.11:1.1.2-2.1 --repositories http://repo.hortonworks.com/content/groups/public --conf spark.sql.hive.llap=true

(The above HCC article is written for HDP-2.6.0.3 and it suggests to set spark_thrift_cmd_opts in spark2-env as --packages com.hortonworks.spark:spark-llap-assembly_2.11:1.1.1-2.1 --repositories http://repo.hortonworks.com/content/groups/public --conf spark.sql.hive.llap=true)

For Spark 1.x with jdbc interpreter

For HDP-2.6.1 cluster, configure spark_thrift_cmd_opts in spark-env as

--packages com.hortonworks.spark:spark-llap-assembly_2.10:1.0.6-1.6 --repositories http://repo.hortonworks.com/content/groups/public --conf spark.sql.hive.llap=true

berry_osterlund · ‎08-23-2017

After following this quide, I still got Kerberos errors when trying to communicate with LLAP. Turns out that you also need to set livy.spark.yarn.security.credentials.hiveserver2.enabled=true in the Livy interpreter in Zeppelin to make it work.

matt_andruff · ‎03-14-2018

This doesn't work for hdp 2.6.3

Cloudera Community

Community Articles

Using Row/Column level security of Spark with Zeppelin's jdbc and livy interpreters

Apache Spark

Apache Zeppelin