Support Questions

zhoussen · ‎05-30-2017

On HDP 2.6, when trying to run the following paragraph as user2/user2 from a Zeppelin notebook (This is running in yarn-cluster mode):

%livy2.spark 
sc.version

It hangs for a bit, times out, and gives me the following java stack:

org.apache.zeppelin.livy.LivyException: Session 60 is finished, appId: null, log: [java.lang.Exception: No YARN application is found with tag livy-session-60-zahglq2y in 60 seconds. Please check your cluster status, it is may be very busy., com.cloudera.livy.utils.SparkYarnApp.com$cloudera$livy$utils$SparkYarnApp$$getAppIdFromTag(SparkYarnApp.scala:182) com.cloudera.livy.utils.SparkYarnApp$$anonfun$1$$anonfun$4.apply(SparkYarnApp.scala:248) com.cloudera.livy.utils.SparkYarnApp$$anonfun$1$$anonfun$4.apply(SparkYarnApp.scala:245) scala.Option.getOrElse(Option.scala:120) com.cloudera.livy.utils.SparkYarnApp$$anonfun$1.apply$mcV$sp(SparkYarnApp.scala:245) com.cloudera.livy.Utils$$anon$1.run(Utils.scala:95)]
	at org.apache.zeppelin.livy.BaseLivyInterprereter.createSession(BaseLivyInterprereter.java:209)
	at org.apache.zeppelin.livy.BaseLivyInterprereter.initLivySession(BaseLivyInterprereter.java:98)
	at org.apache.zeppelin.livy.BaseLivyInterprereter.open(BaseLivyInterprereter.java:80)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:482)
	at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
	at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)

From YARN logs, I just see this logged and nothing else from these unsuccessful attempts:

2017-05-30 17:22:44,115 INFO  resourcemanager.ClientRMService (ClientRMService.java:getNewApplicationId(291)) - Allocated new applicationId: 32
2017-05-30 17:28:55,804 INFO  resourcemanager.ClientRMService (ClientRMService.java:getNewApplicationId(291)) - Allocated new applicationId: 33

The same notebook works perfectly fine as user 'admin'. It's just when switching the user that it causes this issue. Any suggestion on what is wrong? And, there are plenty of resources available on YARN.

zhoussen · ‎05-30-2017

@yvora

I found the answer in the actual livy server log itself (Not the zeppelin livy interpreter log I was looking at all this time):

17/05/30 18:53:34 INFO InteractiveSessionManager: Registering new session 67
17/05/30 18:53:35 INFO ContextLauncher: Warning: Master yarn-cluster is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
17/05/30 18:53:36 INFO ContextLauncher: 17/05/30 18:53:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/30 18:53:37 INFO ContextLauncher: 17/05/30 18:53:37 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
17/05/30 18:53:37 INFO ContextLauncher: 17/05/30 18:53:37 INFO RMProxy: Connecting to ResourceManager at zhoussen-edw1.field.hortonworks.com/172.26.255.217:8050
17/05/30 18:53:37 INFO ContextLauncher: 17/05/30 18:53:37 INFO Client: Requesting a new application from cluster with 4 NodeManagers
17/05/30 18:53:37 INFO ContextLauncher: 17/05/30 18:53:37 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (4096 MB per container)
17/05/30 18:53:37 INFO ContextLauncher: 17/05/30 18:53:37 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
17/05/30 18:53:37 INFO ContextLauncher: 17/05/30 18:53:37 INFO Client: Setting up container launch context for our AM
17/05/30 18:53:37 INFO ContextLauncher: 17/05/30 18:53:37 INFO Client: Setting up the launch environment for our AM container
17/05/30 18:53:37 INFO ContextLauncher: 17/05/30 18:53:37 INFO Client: Preparing resources for our AM container
17/05/30 18:53:39 INFO ContextLauncher: Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=user1, access=WRITE, inode="/user/user1/.sparkStaging/application_1496151555596_0039":hdfs:hdfs:drwxr-xr-x

So, it appears Livy was indeed able to connect to the resource manager and get an application id (as correlated per the previous entry in the YARN log - ), but cannot proceed further to allocate an AM because it can't write /user/user1 on hdfs due to permission problems. After creating the directory, /user/user1, it works fine.

View solution in original post

yvora · ‎05-30-2017

@zhoussen, As per livy logs, The spark application was not started correctly. In order to find out its root cause, please check the spark application logs.

Steps to follow:

1) Check the status of yarn cluster. ( List running applications)

2) Run livy paragraph as user2

3) Check if new application is launched in Yarn. If a new application is launched, check its status and application log for further debugging.

zhoussen · ‎05-30-2017

@yvora

Doesn't help. The YARN cluster is healthy, and doesn't even show this application in any failed state. The application log doesn't contain any more helpful message.

yvora · ‎05-30-2017

@zhoussen, so if application with "livy-session-60-zahglq2y" tag is alive and running fine. You need to update the livy app lookup timeout to be more than 60 secs. It seems that livy believes that yarn application was not started within 60 sec.

set livy.server.yarn.app-lookup-timeout to may be 300 sec.

zhoussen · ‎05-30-2017

@yvora

I found the answer in the actual livy server log itself (Not the zeppelin livy interpreter log I was looking at all this time):

17/05/30 18:53:34 INFO InteractiveSessionManager: Registering new session 67
17/05/30 18:53:35 INFO ContextLauncher: Warning: Master yarn-cluster is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
17/05/30 18:53:36 INFO ContextLauncher: 17/05/30 18:53:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/30 18:53:37 INFO ContextLauncher: 17/05/30 18:53:37 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
17/05/30 18:53:37 INFO ContextLauncher: 17/05/30 18:53:37 INFO RMProxy: Connecting to ResourceManager at zhoussen-edw1.field.hortonworks.com/172.26.255.217:8050
17/05/30 18:53:37 INFO ContextLauncher: 17/05/30 18:53:37 INFO Client: Requesting a new application from cluster with 4 NodeManagers
17/05/30 18:53:37 INFO ContextLauncher: 17/05/30 18:53:37 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (4096 MB per container)
17/05/30 18:53:37 INFO ContextLauncher: 17/05/30 18:53:37 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
17/05/30 18:53:37 INFO ContextLauncher: 17/05/30 18:53:37 INFO Client: Setting up container launch context for our AM
17/05/30 18:53:37 INFO ContextLauncher: 17/05/30 18:53:37 INFO Client: Setting up the launch environment for our AM container
17/05/30 18:53:37 INFO ContextLauncher: 17/05/30 18:53:37 INFO Client: Preparing resources for our AM container
17/05/30 18:53:39 INFO ContextLauncher: Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=user1, access=WRITE, inode="/user/user1/.sparkStaging/application_1496151555596_0039":hdfs:hdfs:drwxr-xr-x

So, it appears Livy was indeed able to connect to the resource manager and get an application id (as correlated per the previous entry in the YARN log - ), but cannot proceed further to allocate an AM because it can't write /user/user1 on hdfs due to permission problems. After creating the directory, /user/user1, it works fine.

zhoussen · ‎05-30-2017

@yvora

Thanks for your answer. Made me look back at the entire flow.

jzhang · ‎06-02-2017

It looks like the owner of /user/user1 is hdfs, but should be user1. Not sure how you create folder /user/user1, if you are admin, please change the owner, or ask your admin to do that.

zhoussen · ‎01-08-2019

@bkv

Check the YARN logs. It could be starving on YARN containers. You may need to adjust some YARN container settings there. As well, please post yours as a separate new issue rather than an answer to this one.

Cloudera Community

Support Questions

Error when running a Spark job as a different zeppelin user with livy