Member since
09-29-2015
122
Posts
159
Kudos Received
26
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3696 | 11-12-2016 12:32 AM | |
906 | 10-05-2016 08:08 PM | |
1176 | 08-02-2016 11:29 PM | |
17727 | 06-24-2016 11:46 PM | |
1014 | 05-25-2016 11:12 PM |
11-22-2017
06:46 PM
Can you turn up Livy log level and post the log in the livy-server?
... View more
11-17-2017
06:16 PM
Can you share the details of how Livy job is launched? This exception appears to come from Livy side, did the Spark job actually get launched? What are the errors on Spark executor side?
... View more
10-13-2017
07:14 PM
7 Kudos
Zeppelin Best Practices
Install & Versions
Leverage Ambari to install Zeppelin and always use the latest version of Zeppelin. With HDP 2.6.2, Zeppelin 0.7.2 is available and it contains many useful stability & security fixes that will improve your experience. Zeppelin in HDP 2.5.x has many known issues that were resolved in 2.6.2 Deployment Choices
While you can select any node type to install Zeppelin, the best place is a gateway node. The reason gateway node makes most sense is when the cluster is firewalled off and protected from outside, users can still see the gateway node. Hardware Requirement
More memory & more Cores are better Memory: Minimum of 64 GB node Cores: Minimum of 8 cores # of users: A given Zeppelin node can support 8-10 users. If you want more users, you can set up multiple Zeppelin instances. More details in MT section. Security: Like any software, the security depends on threat matrix and deployment choices. This section assumes a MT Zeppelin deployment.
Authentication
Kerberize HDP Cluster using Ambari Configure Zeppelin to leverage corporate LDAP for authentication Don’t use Zeppelin’s local user based authentication, except for demo setup. Authorization
Limit end-users access to configure interpreter. Interpreter configuration is shared and only admins should have the access to configure interpreter. Leverage Zeppelin’s shiro configuration to achieve this. With Livy interpreter Spark jobs are sent under end-user identity to HDP cluster. All Ranger based policy controls apply. With JDBC interpreter Hive & Spark access is done under end-user identity. All Ranger based policy controls apply. Passwords:
Leverage Zeppelin’s support for hiding password in Hadoop credential for LDAP and JDBC password. Don’t put password in clear in shiro.ini Multi - Tenancy & HA
In a MT environment, only allow admin role access to interpreter configuration A given Zeppelin instance should support only < 10 users. To support more users, setup multiple Zeppelin instance and put a HTTP proxy like NGinx with sticky sessions to route same user to same Zeppelin instance. Sticky sessions are needed since Zeppelin stored notebook under a given Zeppelin instance dir. If you use a networks storage system, the Zeppelin notebook directory can be stored on the network storage and in that case sticky sessions are not needed. With upcoming HDP 2.6.3, Zeppelin will store notebooks in HDFS and this requirement will not be necessary. Interpreters
Leverage Livy interpreter for Spark jobs against HDP cluster. Don’t use Spark interpreter since it does not provide ideal identity propagation. Avoid using Shell interpreter, since the security isolation isn’t ideal. Don’t use the interpreter UI for impersonation. It works for Livy & JDBC (Hive) and for all others we don’t officially support i Users should restart their own interpreter session from the notebook page button instead of the interpreter page which would restart sessions for all users Livy interpreter JDBC interpreter Also See Jianfeng Zhang's Zeppelin Best Practices notebook
... View more
- Find more articles tagged with:
- best-practices
- Data Science & Advanced Analytics
- FAQ
- zeppelin
- zeppelin-multi-user
- zeppelin-notebook
Labels:
09-15-2017
04:18 AM
To reduce the complexities, would it be possible to run the code in spark-shell with the same executor memory configuration that is allocated via Zeppelin. That way we can avoid any Zeppelin issue and isolate better where the issue lies?
... View more
09-14-2017
03:53 PM
1 Kudo
GraphX is not ready for prime time, it is in technical preview (please see table 1.1 https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_spark-component-guide/content/ch_introduction-spark.html) since it is in Alpha state in the community. We should start by first looking at how much data is being used to compute the Graph? If the graph is bigger than what is allocated to the executors , OOM is expected. So if you find out how many vertices and what kind of graph computation is being done we can try to dig deeper.
... View more
09-12-2017
10:00 PM
This is coming very late but it appears you are launching a Spark Standalone cluster and not really submitting Spark on YARN. For Spark on YARN, as per http://spark.apache.org/docs/latest/running-on-yarn.html the spark-submit needs to say --master yarn. The actual details of yarn cluster details are picked up from yarn conf dir. ./bin/spark-submit --class path.to.your.Class --master yarn --deploy-mode cluster [options] <app jar> [app options]
... View more
08-30-2017
08:14 PM
Thanks @Sebastian Torres Brown. What will be the interaction flow? Will it be routed through Spark Thrift Server?
... View more
08-30-2017
07:49 PM
We are looking at this. Can you please detail your use case? Is it ETL or BI?
... View more
06-29-2017
06:24 PM
You may want to look at Spark HBase connector that is based on DataFrame and should give better performance. https://github.com/hortonworks-spark/shc
... View more
06-29-2017
06:19 PM
What is the use case to have user specific YARN queue for STS Jobs? One workaround is to have multiple STS with different queue. How many users are you thinking about?
... View more
05-25-2017
09:46 PM
This is applicable in a Kerberos enabled HDP 2.5.x cluster with Zeppelin, Livy & Spark. Post successful Kerberos setup, log in to Zeppelin and run Spark note, the note runs file. But running simple sc.version from livy interpreter gives "Cannot start spark" in the Zeppelin UI. In the Livy log at /var/log/livy/livy-livy-server.out you may find a message similar to the following. INFO: 17/05/25 21:24:12 INFO metastore: Trying to connect to metastore with URI thrift://vinay-hdp25-2.field.hortonworks.com:9083 May 25, 2017 9:24:12 PM org.apache.spark.launcher.OutputRedirector redirect INFO: 17/05/25 21:24:12 ERROR TSaslTransport: SASL negotiation failure May 25, 2017 9:24:12 PM org.apache.spark.launcher.OutputRedirector redirect INFO: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] This happens when Livy tried to connect to Hive Metastore and fails with above message. The fix is to configure Zeppelin's Livy interpreter to run in yarn-cluster mode, instead of the default yarn-client mode. After you change any interpreter configuration, you will need to restart the interpreter. Below works. livy.spark.master yarn-cluster Starting HDP 2.6.x this configuration is changed OOB to yarn-cluster.
... View more
- Find more articles tagged with:
- Data Science & Advanced Analytics
- FAQ
- livy-kerberos
- livy-spark
- zeppelin-kerberos
Labels:
04-24-2017
08:37 PM
Livy is not supported in HDP 2.5. It is supported in HDP 2.6. Can you please retry your scenario with 2.6 Sandbox https://hortonworks.com/downloads/#sandbox
... View more
03-14-2017
06:53 PM
It likely is due to insufficient memory. You can try bumping up the memory allocated to Sandbox and also in sandbox shutdown the unneeded services. Another option is to try out with Spark 2.1 in HDC https://hortonworks.com/blog/try-apache-spark-2-1-zeppelin-hortonworks-data-cloud/
... View more
03-08-2017
07:00 PM
1 Kudo
SparkSQL does provide a JDBC/ODBC interface via Spark Thrift Server. It is part of HDP. You can connect to STS from any BI client and issue SQL queries.
... View more
03-08-2017
06:53 PM
1 Kudo
@Apoorv Pathak Right now Zeppelin does not have functionality to limit creation of new pages. How important is this for you?
... View more
03-08-2017
06:44 PM
This could be due to lack of sufficient memory. How did you launch the spark-shell? Is it in YARN mode or in standalone? Also how is Zeppelin's Spark interpreter configured? YARN or Standalone?
... View more
11-12-2016
12:32 AM
2 Kudos
This is a known issue. Livy session times out by default in 60 minutes. You can change the timeout by property livy.server.session.timeout. Upon Livy session timeout the Zeppelin's Livy interpreter needs to be restarted. This will be fixed in HDP 2.6
... View more
11-04-2016
06:41 PM
This needs 2 steps In the Zeppelin shell interpreter config, enable impersonation Ensure end user has an OS account on the node where Zeppelin is running. Thanks
... View more
10-05-2016
08:08 PM
1 Kudo
Zeppelin in HDP 2.5 does not support Spark 2. A technical preview of this will land in Hortonworks Data Cloud in next month or so.
... View more
08-17-2016
04:31 PM
@subhash parise This question seems to be about Hive but Spark is also tagged on it. Can you please explain the scenario that you are trying to enable? Thanks,
... View more
08-10-2016
03:53 PM
1 Kudo
@Charles Chen Please see this blog on how to use HBase with Spark on HDP 2.4.x +
... View more
08-10-2016
03:44 PM
@Randy Gelhausen You can switch your notebook to the report view, please see the dropdown top right. Screenshot attached. Is this what you are looking for? screen-shot-2016-08-10-at-114601-am.png
... View more
08-10-2016
03:39 PM
Spark 2.0 is coming as technical preview on HDP 2.5. You can also try it now with Hortonworks cloud, See this blog
... View more
08-09-2016
06:55 PM
@Yaron Idan You can also add spark client to a slave node and then add Zeppelin to that node.
... View more
08-03-2016
04:31 PM
Spark 2.0 is technical preview in HDP 2.5. You can try out 2.0 in Hortonworks cloud today or look for HDP 2.5 announcement to try it in your own env.
... View more
08-03-2016
04:29 PM
Try enabling Spark debug and if possible get complete stack trace. Does the error only happen on Windows?
... View more
08-02-2016
11:29 PM
1 Kudo
Livy sessions are recycled after an hour of session inactivity. This timeout is configured with livy.server.session.timeout
... View more
08-02-2016
11:11 PM
Are you using Zeppelin directly with Spark interpreter or with Livy interpreter? In Z > Livy > Spark scenario, the Livy session is recycled some time of inactivity. I will report back if default timeout and how to configure it. To come out of this behavior, you will need to restart Livy interpreter.
... View more
07-29-2016
12:44 AM
Spark Streaming & Dynamic Resource Allocation is a new feature with Spark 2.0 (https://issues.apache.org/jira/browse/SPARK-12133) so it is not yet available in either HDP or CDH.
... View more