About bikas

bikas · ‎01-17-2017

If this does not work for you please open the feature request by creating an issue on the github project for SHC. /cc @wyang

bikas · ‎01-17-2017

Ideally, just before that OWN failure log, there should be an exception or error message about some task for vertex with id 1484566407737_0004_1. That could give more info. Even if more info is not there, you will be able to find the task attempt that actually failed. That task attempt can show you which machine and YARN container is ran on. Sometimes the logs dont have the error because it logged into stderr. In that case, the stderr from the containers YARN logs may show the error.

bikas · ‎01-11-2017

+1. Thats what I mentioned in my last comment below. Copying here so everyone get the context quickly. Ranger KMS could be the issue because it causes problems for getting HDFS delegation token. If Z or L user needs to get HDFS delegation token then they also need to be super users for Ranger. You are better off trying with non-Ranger cluster or adding them to Ranger super users which is different from core site super users.

bikas · ‎01-11-2017

AM percent property in YARN is relevant if the cluster has idle resources but still and AM is not being being started for the application. On the YARN UI you will see available capacity but AM not being started. E.g. cluster has 100GB capacity and is using 50GB only. If you want to run X apps concurrently and each AM need M GB resources (per config) then you need X*M capacity for AMs and this can be used to determine the AM percent as a function of the total cluster capacity. On the other hand, if the cluster does not have any capacity at that time (as seen in YARN UI) then changing the AM percent may not help. The cluster does not have capacity to obtain a container slot for the AM. E.g. cluster has 100GB capacity and is already using 100GB. In this case you will have to wait for capacity to free up.

bikas · ‎01-11-2017

AM percent property in YARN is relevant if the cluster has idle resources but still and AM is not being being started for the application. On the YARN UI you will see available capacity but AM not being started. E.g. cluster has 100GB capacity and is using 50GB only. If you want to run X apps concurrently and each AM need M GB resources (per config) then you need X*M capacity for AMs and this can be used to determine the AM percent as a function of the total cluster capacity. On the other hand, if the cluster does not have any capacity at that time (as seen in YARN UI) then changing the AM percent may not help. The cluster does not have capacity to obtain a container slot for the AM. E.g. cluster has 100GB capacity and is already using 100GB. In this case you will have to wait for capacity to free up.

bikas · ‎01-04-2017

Ranger KMS could be the issue because it causes problems for getting HDFS delegation token. If Z or L user needs to get HDFS delegation token then they also need to be super users for Ranger. You are better off trying with non-Ranger cluster or adding them to Ranger super users which is different from core site super users.

bikas · ‎01-03-2017

Why are we changing it in the Zeppelin env? Can this be changed in the Spark interpreter configs? /cc @prabhjyot singh

bikas · ‎01-03-2017

If you are trying to authenticate use FOO via LDAP on Zeppelin and then use Zeppelin to launch a %livy.spark notebook as user FOO then you are using Livy impersonation (this is different from Zeppelin's own impersonation which is only recommended for the shell interpreter, not Livy interpreter). User FOO should be available in the Hadoop cluster also because the jobs will eventually run as that user. HDP 2.5.3 should already have all the configs setup for you. Its a bug that livy.spark.master in Zeppelin is not yarn-cluster. Next, Livy should be using Livy keytab and Zeppelin should be using Zeppelin keytab. Zeppelin user needs to be configured a livy.superuser in Livy config. Livy user should be configured as a proxy user in core-site.xml so that YARN/HDFS allow it to impersonate other users (in thi case hadoopadmin) when submitting spark jobs. If that Zeppelin->Livy connection fails then you will see an exception in Zeppelin and logs in Livy. If that succeeds then Livy will try to submit the job. If that fails you will see the exception in Livy logs. From the above exception in your last comment, it appears that Livy user is not configured as a proxy user properly in core-site.xml. You can check that in hadoop configs and may have to restart affected services in case you change it. In HDP 2.5.3 this should already be done for you during Livy installation via Ambari.

bikas · ‎01-03-2017

Alicia, please see my answer above on Oct 24. If you are running Spark on YARN you will have to go through the YARN RM UI to get to the Spark UI for a running job. Link for YARN UI is available from Ambari YARN service. For a completed job, you will need to go through Spark History Server. Link for Spark history server is available from the Ambari Spark service.

bikas · ‎12-28-2016

In general Zeppelin is running on the Zeppelin server machine in the cluster. So it cannot access local files from the users host machine. The typical thing to do is to upload the file into HDFS and use the HDFS path in %spark notebook code to read the file using Spark.

Online	Offline
Last Visited	‎09-23-2018 04:18 AM

Member Since	‎10-09-2015 06:38 PM
Last Visited	‎09-23-2018 04:18 AM
Posts	76
Kudos received	33

Cloudera Community

Re: Dataframe Insert into ORC table is slow compar...

Re: Spark map vs foreachRdd

Re: How to configure spark-log4j-properties in Amb...

Re: Spark 2 Technical preview with patches

Re: Spark Hbase connector latest version for Spark...

Re: HDP Spark Hbase Connector Cell Versions?

Re: Vertex did not succeed due to OWN_TASK_FAILURE...

Re: Zeppelin livy interpreter not working with Ker...

Re: Tez job hang, waiting for AM container to be a...

Re: Tez job hang, waiting for AM container to be a...

Re: Zeppelin livy interpreter not working with Ker...

Re: How to get Spark Interpreter working with cust...

Re: Zeppelin livy interpreter not working with Ker...

Re: How access to Spark Web UI ?

Re: In Zeppelin loading a simple TextFile where do...