- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 06-30-2017 10:19 PM
1. Goal
This article is in continuation of this HCC article https://community.hortonworks.com/content/kbentry/101181/rowcolumn-level-security-in-sql-for-apache-....
One can take advantage of Row/Column level security of Spark via various Zeppelin interpreters as explained in the following table:
Interpreter name | Row/Column security feature supported? | Reason for no support |
% jdbc (with spark1.x STS) | Yes | |
% jdbc (with spark2 STS) | Yes | |
% livy.sql | No | Zeppelin’s livy interpreter won’t support Row/Column level security because it uses yarn-cluster mode and it needs delegation tokens to access HiveServer2 in yarn-cluster mode. This support is not present in Spark1.x |
% livy2.sql | Yes | |
% spark.sql | No | Zeppelin’s Spark interpreter group does not support user impersonation |
% spark2.sql | No | Zeppelin’s Spark interpreter group does not support user impersonation |
In this article, we will show how to configure Zeppelin’s livy2 and jdbc interpreters to take advantage of Row/Column level security feature provided by Spark in HDP 2.6.1.
2. Environment:
- HDP-2.6.1 Kerberized cluster with Spark, Spark2, Ranger, Zeppelin and Hive installed.
- Non wire-encrypted cluster. (There is an issue with Zeppelin’s livy interpreter in wire encrypted environment, https://issues.apache.org/jira/browse/ZEPPELIN-2584 and hence for the purpose of this article, we have used a non wire-encrypted cluster)
- Zeppelin’s authentication enabled via shiro.ini ( refer this document : https://zeppelin.apache.org/docs/0.7.0/security/shiroauthentication.html for more information)
3. Setup:
3.1 Configure zeppelin's livy2 interpreter
- Download spark-llap_2.11-1.1.2-2.1.jar for Spark2 LLAP in case of HDP-2.6.1 (or spark-llap_2.11-1.1.1-2.1.jar for Spark2 LLAP in case of HDP-2.6.0.3 ) . Store this jar into HDFS. For the purpose of this article, we will refer to this jar as spark2-llap jar.
- For Zeppelin’s livy2 interpreter to support Row/Column level security feature of Spark2-LLAP, we need to configure livy2 interpreter. There is no need of configuring spark2-default as mentioned in section 5.4 of HCC article . In order to do this, go to Zeppelin’s interpreter UI page and edit livy2 interpreter to add following properties
livy.spark.sql.hive.llap = true livy.spark.hadoop.hive.llap.daemon.service.hosts = <value of hive.llap.daemon.service.hosts> livy.spark.jars = <HDFS path of spark2-llap jar> livy.spark.sql.hive.hiveserver2.jdbc.url = <hiveserver2 jdbc URL> livy.spark.sql.hive.hiveserver2.jdbc.url.principal = <value of hive.server2.authentication.kerberos.principal>
3.2 Configure zeppelin's jdbc interpreter
We can use Zeppelin’s jdbc interpreter to route sql queries to Spark1.x or Spark2 by configuring it to use Spark1.x thrift server when invoked with %jdbc(spark) and to use Spark2 thrift server when invoked with %jdbc(spark2)
- Follow steps mentioned in Section 4.2 ,Section 4.3, Section 5.1, Section 5.2 and Section 5.3 sequentially of above HCC article in order to enable Hive Interactive Query and Ranger Hive Plugin Setup HDFS and Hive
- Additionally, follow steps mentioned in section 5.5 of above HCC article to setup Spark2 Thrift Server and Spark1.x thrift server with caveats mentioned in the appendix of this article
- Now, go on Zeppelin’s interpreter UI page and edit jdbc interpreter to add following properties and then save the new configurations
spark.driver : org.apache.hive.jdbc.HiveDriver spark.url : <Spark1.x thrift server jdbc url> spark2.driver : org.apache.hive.jdbc.HiveDriver spark2.url : <Spark2 thrift server jdbc url>
3.3 Running Example
- Follow steps from Section 6 and Section 7 of the above HCC article to setup database, table and ranger policies to run the example. For the purpose of this article I am using ‘hrt_1’ user in place of ‘billing’ user and ‘hrt_2’ user in place of ‘datascience’ user
- Login to Zeppelin UI as ‘hrt_1’ user and run the paragraph ‘SELECT * FROM db_spark.t_spark’ as %jdbc(spark), %jdbc(spark2) and %livy2.sql interpreters . You should see unfiltered and unmasked results as per the set ranger policies
- Login to Zeppelin UI as ‘hrt_2’ user and run the paragraph ‘SELECT * FROM db_spark.t_spark’ as %jdbc(spark), %jdbc(spark2) and %livy2.sql interpreters. You should see filtered and masked results now.
Appendix
For Spark2 with jdbc interpreter
For HDP 2.6.1 cluster, configure spark_thrift_cmd_opts in spark2-env as
--packages com.hortonworks.spark:spark-llap-assembly_2.11:1.1.2-2.1 --repositories http://repo.hortonworks.com/content/groups/public --conf spark.sql.hive.llap=true
(The above HCC article is written for HDP-2.6.0.3 and it suggests to set spark_thrift_cmd_opts in spark2-env as --packages com.hortonworks.spark:spark-llap-assembly_2.11:1.1.1-2.1 --repositories http://repo.hortonworks.com/content/groups/public --conf spark.sql.hive.llap=true)
For Spark 1.x with jdbc interpreter
For HDP-2.6.1 cluster, configure spark_thrift_cmd_opts in spark-env as
--packages com.hortonworks.spark:spark-llap-assembly_2.10:1.0.6-1.6 --repositories http://repo.hortonworks.com/content/groups/public --conf spark.sql.hive.llap=true
Created on 08-23-2017 07:16 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
After following this quide, I still got Kerberos errors when trying to communicate with LLAP. Turns out that you also need to set livy.spark.yarn.security.credentials.hiveserver2.enabled=true in the Livy interpreter in Zeppelin to make it work.
Created on 03-14-2018 05:36 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
This doesn't work for hdp 2.6.3