Member since
08-13-2019
84
Posts
233
Kudos Received
15
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2090 | 02-28-2018 09:27 PM | |
3178 | 01-25-2018 09:44 PM | |
6237 | 09-21-2017 08:17 PM | |
3575 | 09-11-2017 05:21 PM | |
3233 | 07-13-2017 04:56 PM |
06-30-2017
10:19 PM
9 Kudos
1. Goal This article is in continuation of this HCC article https://community.hortonworks.com/content/kbentry/101181/rowcolumn-level-security-in-sql-for-apache-spark-2.html. One can take advantage of Row/Column level security of Spark via various Zeppelin interpreters as explained in the following table: Interpreter name Row/Column security feature supported? Reason for no support % jdbc (with spark1.x STS) Yes % jdbc (with spark2 STS) Yes % livy.sql No Zeppelin’s livy interpreter won’t support Row/Column level security because it uses yarn-cluster mode and it needs delegation tokens to access HiveServer2 in yarn-cluster mode. This support is not present in Spark1.x % livy2.sql Yes % spark.sql No Zeppelin’s Spark interpreter group does not support user impersonation % spark2.sql No Zeppelin’s Spark interpreter group does not support user impersonation In this article, we will show how to configure Zeppelin’s livy2 and jdbc interpreters to take advantage of Row/Column level security feature provided by Spark in HDP 2.6.1. 2. Environment:
HDP-2.6.1 Kerberized cluster with Spark, Spark2, Ranger, Zeppelin and Hive installed. Non wire-encrypted cluster.
(There is an issue with Zeppelin’s livy interpreter in wire encrypted environment, https://issues.apache.org/jira/browse/ZEPPELIN-2584 and hence for the purpose of this article, we have used a non wire-encrypted cluster) Zeppelin’s authentication enabled via shiro.ini ( refer this document : https://zeppelin.apache.org/docs/0.7.0/security/shiroauthentication.html for more information) 3. Setup: 3.1 Configure zeppelin's livy2 interpreter Download spark-llap_2.11-1.1.2-2.1.jar for Spark2 LLAP in case of HDP-2.6.1 (or spark-llap_2.11-1.1.1-2.1.jar for Spark2 LLAP in case of HDP-2.6.0.3 ) . Store this jar into HDFS. For the purpose of this article, we will refer to this jar as spark2-llap jar. For Zeppelin’s livy2 interpreter to support Row/Column level security feature of Spark2-LLAP, we need to configure livy2 interpreter. There is no need of configuring spark2-default as mentioned in section 5.4 of HCC article . In order to do this, go to Zeppelin’s interpreter UI page and edit livy2 interpreter to add following properties livy.spark.sql.hive.llap = true
livy.spark.hadoop.hive.llap.daemon.service.hosts = <value of hive.llap.daemon.service.hosts>
livy.spark.jars = <HDFS path of spark2-llap jar>
livy.spark.sql.hive.hiveserver2.jdbc.url = <hiveserver2 jdbc URL>
livy.spark.sql.hive.hiveserver2.jdbc.url.principal = <value of hive.server2.authentication.kerberos.principal> 3.2 Configure zeppelin's jdbc interpreter We can use Zeppelin’s jdbc interpreter to route sql queries to Spark1.x or Spark2 by configuring it to use Spark1.x thrift server when invoked with %jdbc(spark) and to use Spark2 thrift server when invoked with %jdbc(spark2) Follow steps mentioned in Section 4.2 ,Section 4.3, Section 5.1, Section 5.2 and Section 5.3 sequentially of above HCC article in order to
enable Hive Interactive Query and Ranger Hive Plugin
Setup HDFS and Hive Additionally, follow steps mentioned in section 5.5 of above HCC article to setup Spark2 Thrift Server and Spark1.x thrift server with caveats mentioned in the appendix of this article Now, go on Zeppelin’s interpreter UI page and edit jdbc interpreter to add following properties and then save the new configurations spark.driver : org.apache.hive.jdbc.HiveDriver
spark.url : <Spark1.x thrift server jdbc url>
spark2.driver : org.apache.hive.jdbc.HiveDriver
spark2.url : <Spark2 thrift server jdbc url> 3.3 Running Example Follow steps from Section 6 and Section 7 of the above HCC article to setup database, table and ranger policies to run the example. For the purpose of this article I am using ‘hrt_1’ user in place of ‘billing’ user and ‘hrt_2’ user in place of ‘datascience’ user
Login to Zeppelin UI as ‘hrt_1’ user and run the paragraph ‘SELECT * FROM db_spark.t_spark’ as %jdbc(spark), %jdbc(spark2) and %livy2.sql interpreters . You should see unfiltered and unmasked results as per the set ranger policies
Login to Zeppelin UI as ‘hrt_2’ user and run the paragraph ‘SELECT * FROM db_spark.t_spark’ as %jdbc(spark), %jdbc(spark2) and %livy2.sql interpreters. You should see filtered and masked results now.
Appendix For Spark2 with jdbc interpreter For HDP 2.6.1 cluster, configure spark_thrift_cmd_opts in spark2-env as --packages com.hortonworks.spark:spark-llap-assembly_2.11:1.1.2-2.1 --repositories http://repo.hortonworks.com/content/groups/public --conf spark.sql.hive.llap=true (The above HCC article is written for HDP-2.6.0.3 and it suggests to set spark_thrift_cmd_opts in spark2-env as --packages com.hortonworks.spark:spark-llap-assembly_2.11:1.1.1-2.1 --repositories http://repo.hortonworks.com/content/groups/public --conf spark.sql.hive.llap=true) For Spark 1.x with jdbc interpreter For HDP-2.6.1 cluster, configure spark_thrift_cmd_opts in spark-env as --packages com.hortonworks.spark:spark-llap-assembly_2.10:1.0.6-1.6 --repositories http://repo.hortonworks.com/content/groups/public --conf spark.sql.hive.llap=true
... View more
Labels:
06-30-2017
05:36 PM
@Ramon Wartala Please attach screenshot of livy2 interpreter config as well. Also, Likewise in this article, https://discuss.pivotal.io/hc/en-us/articles/201914097-Hadoop-daemons-in-a-secured-cluster-fails-to-start-with-Unable-to-obtain-password-from-user- are you seeing any statement like this in your zeppelin logs? java.io.IOException: Login failure for hdfs/dev6ha@SATURN.LOCAL from keytab /etc/security/phd/keytab/hdfs.service.keytab
... View more
06-30-2017
04:34 PM
@Ramon Wartala Please paste screenshot of livy2 interpreter configs and also full /etc/livy2/conf/livy.conf file from your livy2 server host
... View more
06-29-2017
08:33 PM
1 Kudo
@Ramon Wartala Please check this article to see if you are missing any of these configs? https://community.hortonworks.com/articles/80059/how-to-configure-zeppelin-livy-interpreter-for-sec.html
... View more
06-29-2017
08:27 PM
3 Kudos
@aswathy Check zeppelin's shiro.ini config through ambari: You should see a [users] section in there [users]
# List of users with their password allowed to access Zeppelin.
# To use a different strategy (LDAP / Database / ...) check the shiro doc at http://shiro.apache.org/configuration.html#Configuration-INISections
admin = admin, admin
user1 = user1, role1, role2
user2 = user2, role3
user3 = user3, role2
So you can use admin/admin, or users1/users1 , users2/users2 and users3/users3 as your default login. But your spark queries wont necessarily run after logging in as one of these. For spark queries to run, the user needs to be present in your linux machines. Hence these are just default logins which you can change yourself. For simple configs, you can add more username/password in text format in [users] section. Or better, you can integrate AD/LDAP as well.
... View more
06-28-2017
06:38 PM
3 Kudos
@Ramon Wartala I would suggest to check if Livy and Livy2 are present under Spark and Spark2 services respectively . If Livy and Livy2 servers are not installed on the cluster, then corresponding interpreters wont be present in Zeppelin check this out : https://issues.apache.org/jira/browse/AMBARI-19919
... View more
06-27-2017
05:13 PM
6 Kudos
@Ramon Wartala By design, zeppelin's spark and spark2 interpreters would always execute your query as 'zeppelin' user and they dont support user impersonation. Hence it is bound to fail if 'zeppelin' user doesn't have the permissions to decrypt the key. jdbc, livy and livy2 interpreters support user impersonation and so your scenario would pass with any of these : %livy.sql, %livy2.sql and %jdbc(hive)
... View more
06-23-2017
06:12 PM
1 Kudo
@suyash soni Not that I am aware of. You can try running this hive query on beeline and/or Ambari Hive view and see if it works for you. If it works there and not via Zeppelin, then its a potential bug.
... View more
06-23-2017
05:52 PM
4 Kudos
@suyash soni Currently Zeppelin UI does not have this feature. You will have to manually find text using browser search function and replace every instance.
... View more
06-08-2017
11:01 PM
7 Kudos
This article describes how to enable Knox proxying for Zeppelin Notebook for
Wire encrypted environment (i.e SSL is enabled for Knox and Zeppelin)
Non Wire encrypted environments
Configuring Knox proxying for Zeppelin Notebook in Wire encrypted Environments
If you have already configured SSL for zeppelin, please proceed to section 2. If not, please read through section 1.
Section 1 : Configuring SSL for Zeppelin
Note : The steps mentioned in section 1 are just for example purpose, and the production setup may be different. Also the steps assume no client-side authentication. For client-side authentication, please follow the Zeppelin Component Guide in HDP release documents
(HDP 2.6.1 Zeppelin Component Guide : https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_zeppelin-component-guide/content/config-ssl-zepp.html )
Create keystore file, truststore file and certificates for each host on the cluster by following these steps
Navigate to the directory where you want to create zeppelin keystore, certificate and truststore files
Create a keystore file on zeppelin server host keytool -genkey -alias $ZeppelinHostFqdn -keyalg RSA -keysize 1024 -dname CN=$ZeppelinHostFqdn ,OU=$OrganizationalUnit ,O=$OrganizationName,L=$City,ST=$State,C=$Country -keypass $KeyPassword -keystore $KeyStoreFile -storepass $KeyStorePassword
Create a certificate file on zeppelin server host by exporting key info from keystore file keytool -export -alias $ZeppelinHostFqdn -keystore $KeyStoreFile -rfc -file $CertificateFile -storepass $KeyStorePassword
Create a truststore file on zeppelin server host Keytool -import -noprompt -alias $ZeppelinHostFqdn -file $CertificateFile -keystore $TrustStoreFile -storepass $TrustStorePassword
Change permissions of keystore file and truststore file to 444 and change owner to ‘zeppelin’ user
Now configure following through ambari in zeppelin-config section
zeppelin.ssl : true
zeppelin.server.ssl.port : $ZeppelinSSLPort
zeppelin.ssl.client.auth: false (true in case of client-side authentication enabled)
zeppelin.ssl.truststore.type : JKS
zeppelin.ssl.truststore.path : $TrustStoreFile
zeppelin.ssl.truststore.password : $TrustStorePassword
zeppelin.ssl.keystore.path : $KeyStoreFile
zeppelin.ssl.keystore.password : $KeyStorePassword
zeppelin.ssl.key.manager.password : $KeyPassword
Section 2: Configuring Knox Proxying
Copy the zeppelin certificate file onto Knox gateway host
Add zeppelin certificate file into java cacert store of the Knox gateway host using following command on Knox gateway host (This is a way to configure knox gateway to trust incoming request from zeppelin server) keytool -import -file $CertificateFile -alias $ZeppelinHostFqdn -keystore $JavaCacertPath
Where $JavaCacertPath is typically : path to your java installation dir + /jre/lib/security/cacerts
You will get a prompt asking for the keystore password (i.e java cacert store password) and the default value is ‘changeit’
Create a topology ui.xml file in $KnoxConfDir/topologies directory on the knox gateway host, and configure the Zeppelin UI using following snippet
<service>
<role>ZEPPELIN</role>
<url>https://$ZeppelinHostFqdn:$ZeppelinSSLPort</url>
</service>
<service>
<role>ZEPPELINUI</role>
<url>https://$ZeppelinHostFqdn:$ZeppelinSSLPort</url>
</service>
<service>
<role>ZEPPELINWS</role>
<url>wss://$ZeppelinHostFqdn:$ZeppelinSSLPort/ws</url>
</service>
Note: make sure to use FQDN of zeppelin host name, as that is the ‘key’(or alias) that we have used in Section 1 to create zeppelin certificate
There is no need to restart either knox gateway or Zeppelin.
Configuring KNOX proxying for Zeppelin Notebook in non Wire encrypted Environments
Create a topology ui.xml file in $KnoxConfDir/topologies directory on the knox gateway host, and configure the Zeppelin UI using following snippet
<service>
<role>ZEPPELIN</role>
<url>http://$ZeppelinHostFqdn:$ZeppelinPort</url>
</service>
<service>
<role>ZEPPELINUI</role>
<url>http://$ZeppelinHostFqdn:$ZeppelinPort</url>
</service>
There is no need to restart either knox gateway or Zeppelin
Using KNOX Proxying For Zeppelin and currently known issues with HDP-2.6.1 release
Once the configurations are finished, you can access Zeppelin UI via Knox proxy using following URL:
https://<knox gateway host>:8443/gateway/ui/zeppelin/
( Note: Please don’t forget to append a trailing ‘/’ with the URL. This bug is work in progress)
The other bug that is still work in progress is that if a user logs out from Zeppelin while using knox’s proxy URL , he does not remain on the Zeppelin’s Login page anymore. (
https://issues.apache.org/jira/browse/ZEPPELIN-2601) The user needs to type the URL again in browser to go back to the Login page
... View more
Labels: