Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
avatar

1. Goal

This article is in continuation of this HCC article https://community.hortonworks.com/content/kbentry/101181/rowcolumn-level-security-in-sql-for-apache-....

One can take advantage of Row/Column level security of Spark via various Zeppelin interpreters as explained in the following table:

Interpreter nameRow/Column security feature supported?Reason for no support
% jdbc (with spark1.x STS)Yes
% jdbc (with spark2 STS)Yes
% livy.sqlNoZeppelin’s livy interpreter won’t support Row/Column level security because it uses yarn-cluster mode and it needs delegation tokens to access HiveServer2 in yarn-cluster mode. This support is not present in Spark1.x
% livy2.sqlYes
% spark.sqlNoZeppelin’s Spark interpreter group does not support user impersonation
% spark2.sqlNoZeppelin’s Spark interpreter group does not support user impersonation

In this article, we will show how to configure Zeppelin’s livy2 and jdbc interpreters to take advantage of Row/Column level security feature provided by Spark in HDP 2.6.1.

2. Environment:

  1. HDP-2.6.1 Kerberized cluster with Spark, Spark2, Ranger, Zeppelin and Hive installed.
  2. Non wire-encrypted cluster. (There is an issue with Zeppelin’s livy interpreter in wire encrypted environment, https://issues.apache.org/jira/browse/ZEPPELIN-2584 and hence for the purpose of this article, we have used a non wire-encrypted cluster)
  3. Zeppelin’s authentication enabled via shiro.ini ( refer this document : https://zeppelin.apache.org/docs/0.7.0/security/shiroauthentication.html for more information)

3. Setup:

3.1 Configure zeppelin's livy2 interpreter

  • For Zeppelin’s livy2 interpreter to support Row/Column level security feature of Spark2-LLAP, we need to configure livy2 interpreter. There is no need of configuring spark2-default as mentioned in section 5.4 of HCC article . In order to do this, go to Zeppelin’s interpreter UI page and edit livy2 interpreter to add following properties
livy.spark.sql.hive.llap = true

livy.spark.hadoop.hive.llap.daemon.service.hosts = <value of hive.llap.daemon.service.hosts>

livy.spark.jars = <HDFS path of spark2-llap jar>

livy.spark.sql.hive.hiveserver2.jdbc.url = <hiveserver2 jdbc URL>

livy.spark.sql.hive.hiveserver2.jdbc.url.principal = <value of hive.server2.authentication.kerberos.principal>

Screen Shot 2017-06-22 at 12.11.53 PM.png

3.2 Configure zeppelin's jdbc interpreter

We can use Zeppelin’s jdbc interpreter to route sql queries to Spark1.x or Spark2 by configuring it to use Spark1.x thrift server when invoked with %jdbc(spark) and to use Spark2 thrift server when invoked with %jdbc(spark2)

  • Follow steps mentioned in Section 4.2 ,Section 4.3, Section 5.1, Section 5.2 and Section 5.3 sequentially of above HCC article in order to enable Hive Interactive Query and Ranger Hive Plugin Setup HDFS and Hive
  • Additionally, follow steps mentioned in section 5.5 of above HCC article to setup Spark2 Thrift Server and Spark1.x thrift server with caveats mentioned in the appendix of this article
  • Now, go on Zeppelin’s interpreter UI page and edit jdbc interpreter to add following properties and then save the new configurations
spark.driver : org.apache.hive.jdbc.HiveDriver

spark.url : <Spark1.x thrift server jdbc url>

spark2.driver : org.apache.hive.jdbc.HiveDriver

spark2.url : <Spark2 thrift server jdbc url>

Screen Shot 2017-06-22 at 12.08.05 PM.png

3.3 Running Example

  • Follow steps from Section 6 and Section 7 of the above HCC article to setup database, table and ranger policies to run the example. For the purpose of this article I am using ‘hrt_1’ user in place of ‘billing’ user and ‘hrt_2’ user in place of ‘datascience’ user
  • Login to Zeppelin UI as ‘hrt_1’ user and run the paragraph ‘SELECT * FROM db_spark.t_spark’ as %jdbc(spark), %jdbc(spark2) and %livy2.sql interpreters . You should see unfiltered and unmasked results as per the set ranger policies

    Screen Shot 2017-06-23 at 12.13.07 PM.png

  • Login to Zeppelin UI as ‘hrt_2’ user and run the paragraph ‘SELECT * FROM db_spark.t_spark’ as %jdbc(spark), %jdbc(spark2) and %livy2.sql interpreters. You should see filtered and masked results now.

    Screen Shot 2017-06-22 at 12.05.23 PM.png

Appendix

For Spark2 with jdbc interpreter

For HDP 2.6.1 cluster, configure spark_thrift_cmd_opts in spark2-env as

--packages com.hortonworks.spark:spark-llap-assembly_2.11:1.1.2-2.1 --repositories http://repo.hortonworks.com/content/groups/public --conf spark.sql.hive.llap=true

(The above HCC article is written for HDP-2.6.0.3 and it suggests to set spark_thrift_cmd_opts in spark2-env as --packages com.hortonworks.spark:spark-llap-assembly_2.11:1.1.1-2.1 --repositories http://repo.hortonworks.com/content/groups/public --conf spark.sql.hive.llap=true)

For Spark 1.x with jdbc interpreter

For HDP-2.6.1 cluster, configure spark_thrift_cmd_opts in spark-env as

--packages com.hortonworks.spark:spark-llap-assembly_2.10:1.0.6-1.6 --repositories http://repo.hortonworks.com/content/groups/public --conf spark.sql.hive.llap=true

2,929 Views
Comments

After following this quide, I still got Kerberos errors when trying to communicate with LLAP. Turns out that you also need to set livy.spark.yarn.security.credentials.hiveserver2.enabled=true in the Livy interpreter in Zeppelin to make it work.

This doesn't work for hdp 2.6.3