Member since
08-13-2019
84
Posts
232
Kudos Received
15
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1072 | 02-28-2018 09:27 PM | |
2049 | 01-25-2018 09:44 PM | |
3880 | 09-21-2017 08:17 PM | |
2296 | 09-11-2017 05:21 PM | |
2377 | 07-13-2017 04:56 PM |
03-14-2018
06:38 PM
@Patrick Young Can you check your python version on the cluster nodes? Is it 2.6 by any chance?
... View more
02-28-2018
09:27 PM
1 Kudo
@Matt Andruff The operation you are trying to do is basically save a temporary spark table into Hive via Livy (i.e a spark-app). If you check the 2nd table in this support matrix, this one is not a supported operation via spark-llap connector https://github.com/hortonworks-spark/spark-llap/wiki/7.-Support-Matrix#spark-shells-and-spark-apps But such operations(i.e. creating a table) should be supported by jdbc(spark1) interpreter as mentioned in the table 1 on the same link. jdbc(spark1) will direct the query through spark thrift server which is running as 'hive' principal as mentioned in the same wiki. If you however want above operation to succeed, then you logged in user in Zeppelin should have proper authorizations on hive warehouse directory. Then only spark will be able to save the table in hive warehouse for you. Hope that helps
... View more
01-25-2018
09:44 PM
1 Kudo
@Sridhar Reddy Since Spark2 interpreter is in globally shared mode, there is only one Spark2 session (i.e. Spark2 context) shared between all users and all notebooks in zeppelin. A variable defined in one paragraph of one notebook maybe accessed freely in other paragraphs of the same notebook, and for that matter paragraphs of other notebooks as well. Attaching screenshots screen-shot-2018-01-25-at-14317-pm.png screen-shot-2018-01-25-at-14344-pm.png
... View more
10-12-2017
10:06 PM
1 Kudo
@Sergey Sheypak I think the issue is in this line: with serdeproperties ('serialization.class'='com.my.ContainerProto') You are trying to create a table with external SerDe class specified which is resulting in class not found error. The way to go around this is to add any external class that you are using in code in dependencies list in Spark interpreter follow steps here to do this : https://zeppelin.apache.org/docs/latest/manual/dependencymanagement.html Once you do this, restart the interpreter and try to run query with %spark.sql Hope this helps !! For more background and information, read through these https://mail-archives.apache.org/mod_mbox/incubator-zeppelin-users/201601.mbox/%3CCACcq8R74eTEhKu_j7nMyUvvGmrEt55iYAocXw2UwSh+gvAH1xw@mail.gmail.com%3E https://issues.apache.org/jira/browse/ZEPPELIN-648 (This is resolved now) https://issues.apache.org/jira/browse/ZEPPELIN-381 (This is resolved now too)
... View more
10-12-2017
09:47 PM
3 Kudos
Thanks @dbalasundaran for pointing to the article. This works for me There is one caveat in this though, If your cluster is kerberos enabled, then there is one more step required before installing the service in last step: Send a POST request to "/credentials/kdc.admin.credential" with data as '{ "Credential" : { "principal" : "user@EXAMPLE.COM", "key" : "password", "type" : "temporary" } }'
... View more
10-12-2017
08:14 PM
5 Kudos
I want to install 'Zeppelin' service via ambari REST API and have Zeppelin server running on one particular node. How do I do it?
... View more
- Tags:
- Ambari
- Hadoop Core
Labels:
- Labels:
-
Apache Ambari
09-22-2017
05:53 PM
2 Kudos
@Shota Akhalaia My guess is that when you have /** = authc before /api/interpreter/** = authc, roles[admin] the authorization that you give to 'admin' users only for /api/interpreter/** is getting overridden by /** = authc which basically allows all apis to be accessible to all roles. I tried it on my instance, and ordering /** = authc as the first line really makes interpreters page accessible to all the users. Whereas making it as the last line makes it accessible only to the 'admin' users. The linked document also suggests to make it as the last line So please try this and let me know if it works [urls]
/api/interpreter/** = authc, roles[admin]
/api/configuration/** = authc, roles[admin]
/api/credential/** = authc, roles[admin]
/** = authc
#/** = anon
... View more
09-21-2017
08:17 PM
6 Kudos
@Shota Akhalaia Can you try once to configure [urls] section as mentioned in this example document: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_zeppelin-component-guide/content/config-example.html ? I am just wondering whether order of this line matters in shiro.ini : /** = authc ?
... View more
09-21-2017
06:35 PM
5 Kudos
@Sudheer Velagapudi If you look at the zeppelin jdbc interpreter configuration, you will see these 4 properties: default.driver ( Your driver, for e.g. org.postgresql.Driver) default.url (jdbc connection URL) default.user default.password You can configure these 4 properties and then use %jdbc as to do SQL queries Please go through this page for more information: https://zeppelin.apache.org/docs/0.6.1/interpreter/jdbc.html
... View more
09-13-2017
12:01 AM
2 Kudos
@William Brooks Please refer to these JIRAs : https://issues.cloudera.org/browse/LIVY-194 https://issues.apache.org/jira/browse/LIVY-325 This feature that you are requesting will be available with %livy interpreter(and not %spark interpreter) when these 2 JIRAs are resolved
... View more
09-11-2017
05:21 PM
8 Kudos
@anjul tiwari 1. when I provide only read permission to user and share it in report mode, user is able to view the notebook but not allowed the run the paragraphs. That is expected behavior. A person who has 'Read Only' permission will be able to view the notebook and the visualizations/tables but will not be able to run the paragraphs or change the code 2. when i provide both read and write permissions then user is not only allowed to run the code but also able to view the code and change the mode as well.
When you provide 'read' and 'write' permissions to the user, he will be able to change the code as well as he will be able to run the code. But he should not be allowed to change the 'mode'/'permissions' . Can you confirm if `zeppelin.anonymous.allowed` and `zeppelin.notebook.public` properties are set to 'false' ? In any case, if I understand correctly - you want a mode where a person should not be able to read/modify code but he should still be able to run the code and visualize the results. This mode is not supported in Zeppelin currently.
... View more
08-15-2017
09:50 PM
8 Kudos
@Akash Mendiratta No, you don't need a passwordless SSH. I think you are missing step-4 and step-5 in following description It works for me with following steps 1) Enabled user impersonate for sh interpreter via Zeppelin interpreter GUI 2) Enabled shiro.ini authentication (in my case for trial purpose, simply added a few users which are present on the cluster in [users] section) 3) Added export ZEPPELIN_IMPERSONATE_CMD='sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c ' in zeppelin-env.sh 4) Added 'zeppelin' user in the sudoers list :
For centos 6 : On zeppelin server node, i did sudo visudo added line zeppelin ALL=(ALL) NOPASSWD: ALL 5) Also I had a kerberos enabled cluster, so make sure that zeppelin.server.kerberos.keytab has permissions for users that you created in step-2 Now I logged into zeppelin as the user created in step-2 and run %sh whoami Then it will show the currently logged in user as the result of this paragraph
... View more
08-14-2017
09:42 PM
7 Kudos
@William Brooks It is a work in progress and currently you might not be able to share context between sparkr and pyspark. You can however share context between livy.spark and livy.sql
... View more
07-13-2017
04:56 PM
6 Kudos
@shivanand khobanna Are you defining those variables with %spark interpreter ? In that case, the default mode of %spark interpreter is 'Globally Shared' In Shared mode, single JVM process and single Interpreter Group serves all Notes. Hence you might see variables defined in one note available to all users and all notebooks. So the behavior you are seeing is by design. You can change your interpreter modes through interpreters page. But better use 'livy' interpreter which uses 'Per user scoped' mode by default on HDP installed zeppelin. That means that you will see different YARN APPs for each user who is trying to use %livy interpreter and hence different spark context for each user which disables the sharing of namespace variables defined by one user from the other user. Please checkout this article for more info on various zeppelin interpreter modes and what each of the modes means: https://medium.com/@leemoonsoo/apache-zeppelin-interpreter-mode-explained-bae0525d0555
... View more
07-07-2017
10:23 AM
2 Kudos
Yes, you need to install R on all cluster nodes
... View more
06-30-2017
10:19 PM
9 Kudos
1. Goal This article is in continuation of this HCC article https://community.hortonworks.com/content/kbentry/101181/rowcolumn-level-security-in-sql-for-apache-spark-2.html. One can take advantage of Row/Column level security of Spark via various Zeppelin interpreters as explained in the following table: Interpreter name Row/Column security feature supported? Reason for no support % jdbc (with spark1.x STS) Yes % jdbc (with spark2 STS) Yes % livy.sql No Zeppelin’s livy interpreter won’t support Row/Column level security because it uses yarn-cluster mode and it needs delegation tokens to access HiveServer2 in yarn-cluster mode. This support is not present in Spark1.x % livy2.sql Yes % spark.sql No Zeppelin’s Spark interpreter group does not support user impersonation % spark2.sql No Zeppelin’s Spark interpreter group does not support user impersonation In this article, we will show how to configure Zeppelin’s livy2 and jdbc interpreters to take advantage of Row/Column level security feature provided by Spark in HDP 2.6.1. 2. Environment:
HDP-2.6.1 Kerberized cluster with Spark, Spark2, Ranger, Zeppelin and Hive installed. Non wire-encrypted cluster.
(There is an issue with Zeppelin’s livy interpreter in wire encrypted environment, https://issues.apache.org/jira/browse/ZEPPELIN-2584 and hence for the purpose of this article, we have used a non wire-encrypted cluster) Zeppelin’s authentication enabled via shiro.ini ( refer this document : https://zeppelin.apache.org/docs/0.7.0/security/shiroauthentication.html for more information) 3. Setup: 3.1 Configure zeppelin's livy2 interpreter Download spark-llap_2.11-1.1.2-2.1.jar for Spark2 LLAP in case of HDP-2.6.1 (or spark-llap_2.11-1.1.1-2.1.jar for Spark2 LLAP in case of HDP-2.6.0.3 ) . Store this jar into HDFS. For the purpose of this article, we will refer to this jar as spark2-llap jar. For Zeppelin’s livy2 interpreter to support Row/Column level security feature of Spark2-LLAP, we need to configure livy2 interpreter. There is no need of configuring spark2-default as mentioned in section 5.4 of HCC article . In order to do this, go to Zeppelin’s interpreter UI page and edit livy2 interpreter to add following properties livy.spark.sql.hive.llap = true
livy.spark.hadoop.hive.llap.daemon.service.hosts = <value of hive.llap.daemon.service.hosts>
livy.spark.jars = <HDFS path of spark2-llap jar>
livy.spark.sql.hive.hiveserver2.jdbc.url = <hiveserver2 jdbc URL>
livy.spark.sql.hive.hiveserver2.jdbc.url.principal = <value of hive.server2.authentication.kerberos.principal> 3.2 Configure zeppelin's jdbc interpreter We can use Zeppelin’s jdbc interpreter to route sql queries to Spark1.x or Spark2 by configuring it to use Spark1.x thrift server when invoked with %jdbc(spark) and to use Spark2 thrift server when invoked with %jdbc(spark2) Follow steps mentioned in Section 4.2 ,Section 4.3, Section 5.1, Section 5.2 and Section 5.3 sequentially of above HCC article in order to
enable Hive Interactive Query and Ranger Hive Plugin
Setup HDFS and Hive Additionally, follow steps mentioned in section 5.5 of above HCC article to setup Spark2 Thrift Server and Spark1.x thrift server with caveats mentioned in the appendix of this article Now, go on Zeppelin’s interpreter UI page and edit jdbc interpreter to add following properties and then save the new configurations spark.driver : org.apache.hive.jdbc.HiveDriver
spark.url : <Spark1.x thrift server jdbc url>
spark2.driver : org.apache.hive.jdbc.HiveDriver
spark2.url : <Spark2 thrift server jdbc url> 3.3 Running Example Follow steps from Section 6 and Section 7 of the above HCC article to setup database, table and ranger policies to run the example. For the purpose of this article I am using ‘hrt_1’ user in place of ‘billing’ user and ‘hrt_2’ user in place of ‘datascience’ user
Login to Zeppelin UI as ‘hrt_1’ user and run the paragraph ‘SELECT * FROM db_spark.t_spark’ as %jdbc(spark), %jdbc(spark2) and %livy2.sql interpreters . You should see unfiltered and unmasked results as per the set ranger policies
Login to Zeppelin UI as ‘hrt_2’ user and run the paragraph ‘SELECT * FROM db_spark.t_spark’ as %jdbc(spark), %jdbc(spark2) and %livy2.sql interpreters. You should see filtered and masked results now.
Appendix For Spark2 with jdbc interpreter For HDP 2.6.1 cluster, configure spark_thrift_cmd_opts in spark2-env as --packages com.hortonworks.spark:spark-llap-assembly_2.11:1.1.2-2.1 --repositories http://repo.hortonworks.com/content/groups/public --conf spark.sql.hive.llap=true (The above HCC article is written for HDP-2.6.0.3 and it suggests to set spark_thrift_cmd_opts in spark2-env as --packages com.hortonworks.spark:spark-llap-assembly_2.11:1.1.1-2.1 --repositories http://repo.hortonworks.com/content/groups/public --conf spark.sql.hive.llap=true) For Spark 1.x with jdbc interpreter For HDP-2.6.1 cluster, configure spark_thrift_cmd_opts in spark-env as --packages com.hortonworks.spark:spark-llap-assembly_2.10:1.0.6-1.6 --repositories http://repo.hortonworks.com/content/groups/public --conf spark.sql.hive.llap=true
... View more
- Find more articles tagged with:
- Data Science & Advanced Analytics
- How-ToTutorial
- row-level-filtering
- Spark
- spark-security
- zeppelin
- zeppelin-notebook
Labels:
06-30-2017
05:36 PM
@Ramon Wartala Please attach screenshot of livy2 interpreter config as well. Also, Likewise in this article, https://discuss.pivotal.io/hc/en-us/articles/201914097-Hadoop-daemons-in-a-secured-cluster-fails-to-start-with-Unable-to-obtain-password-from-user- are you seeing any statement like this in your zeppelin logs? java.io.IOException: Login failure for hdfs/dev6ha@SATURN.LOCAL from keytab /etc/security/phd/keytab/hdfs.service.keytab
... View more
06-30-2017
04:34 PM
@Ramon Wartala Please paste screenshot of livy2 interpreter configs and also full /etc/livy2/conf/livy.conf file from your livy2 server host
... View more
06-29-2017
10:53 PM
1 Kudo
@Vinuraj M You can use jdbc(hive) interpreter, and it will automatically pass your currently logged in user downstream.
... View more
06-29-2017
08:33 PM
1 Kudo
@Ramon Wartala Please check this article to see if you are missing any of these configs? https://community.hortonworks.com/articles/80059/how-to-configure-zeppelin-livy-interpreter-for-sec.html
... View more
06-29-2017
08:27 PM
3 Kudos
@aswathy Check zeppelin's shiro.ini config through ambari: You should see a [users] section in there [users]
# List of users with their password allowed to access Zeppelin.
# To use a different strategy (LDAP / Database / ...) check the shiro doc at http://shiro.apache.org/configuration.html#Configuration-INISections
admin = admin, admin
user1 = user1, role1, role2
user2 = user2, role3
user3 = user3, role2
So you can use admin/admin, or users1/users1 , users2/users2 and users3/users3 as your default login. But your spark queries wont necessarily run after logging in as one of these. For spark queries to run, the user needs to be present in your linux machines. Hence these are just default logins which you can change yourself. For simple configs, you can add more username/password in text format in [users] section. Or better, you can integrate AD/LDAP as well.
... View more
06-29-2017
06:24 PM
1 Kudo
@Sanjeev Rao Not sure, but looks like this : In either livy interpreter configs or spark configs in ambari - somewhere it requires a boolean value (true or false) and you might have configured it to "yes". Can you check your configs whether somewhere you have used "yes" string?
... View more
06-29-2017
04:44 PM
1 Kudo
@Sanjeev Rao I see that Livy is not able to launch the yarn application. Can you paste your livy server log? It requires 3 Yarn containers to launch a livy spark app, so please also check if your cluster is busy or not?
... View more
06-28-2017
06:38 PM
3 Kudos
@Ramon Wartala I would suggest to check if Livy and Livy2 are present under Spark and Spark2 services respectively . If Livy and Livy2 servers are not installed on the cluster, then corresponding interpreters wont be present in Zeppelin check this out : https://issues.apache.org/jira/browse/AMBARI-19919
... View more
06-27-2017
05:13 PM
6 Kudos
@Ramon Wartala By design, zeppelin's spark and spark2 interpreters would always execute your query as 'zeppelin' user and they dont support user impersonation. Hence it is bound to fail if 'zeppelin' user doesn't have the permissions to decrypt the key. jdbc, livy and livy2 interpreters support user impersonation and so your scenario would pass with any of these : %livy.sql, %livy2.sql and %jdbc(hive)
... View more
06-23-2017
06:27 PM
1 Kudo
@Jayadeep Jayaraman Few things to check: 1) Just making sure that by 'hadoop.proxyuser.zeppelin-clustername.hosts = * ' in your description, you mean ' hadoop.proxyuser.zeppelin-gettranshdpdev.hosts = * ', correct? 2) Have you enabled zeppelin's authentication? If so, the user that you are logged in as into zeppelin's UI is present on your cluster as an actual linux user? 3) That user's HDFS directory should be present i.e. /user/xyz in HDFS 4) It will be helpful to post the paragraph that you are trying to run, just to make sure its not an errorneous paragraph 5) If everything above is correct, it would be helpful to follow these steps to send some curl requests to livy server: http://gethue.com/how-to-use-the-livy-spark-rest-job-server-for-interactive-spark-2-2/ This will help to isolate the problem i.e. whether livy server is having issue or zeppelin's livy interpreter has issues?
... View more
06-23-2017
06:12 PM
1 Kudo
@suyash soni Not that I am aware of. You can try running this hive query on beeline and/or Ambari Hive view and see if it works for you. If it works there and not via Zeppelin, then its a potential bug.
... View more
06-23-2017
05:52 PM
4 Kudos
@suyash soni Currently Zeppelin UI does not have this feature. You will have to manually find text using browser search function and replace every instance.
... View more
06-23-2017
05:46 PM
@sysadmin CreditVidya I am not exactly aware about when Zeppelin 0.7.2 will be available on a HDP release. Regarding your note.json issue, if I understand correctly - you have a notebook containing large number of paragraphs and they have resulted in errors. All these errors are present in paragraph output, and you are not able to open such notebook. But when you remove these errors manually from the json file, then it works . Is it correct understanding of the issue that you are seeing? Is it possible for you to attach your notebook json file with errors here?
... View more