Support Questions

yasmin · ‎01-15-2019

hi,

while running spark code its throwing the attached error , however in yarn job is showing as succeeded and finished. When i checked spark log the error below thrown.

YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed:

Please clarify me on this.

Thanks

Yasmin

satz · ‎01-15-2019

Hello @yasmin

Thanks for posting your query.

From the error message attached I see the Nodemanager node is complaining that the user ID is not present on the node. Please check if the username is present on the node (where AM is running) or if you are using AD/LDAP just make sure the node is able to resolve the particular username

You can also check more on container logs, by running

#yarn logs -applicationId <applicationID> -appOwner <username_who_triggered_job>

Thanks,
Satz

yasmin · ‎01-23-2019

hey Satz,

Hope you are fine and doing well, thanks much for your response.

As per your reply it says that user who has triggered job, his user id is not listed on that particular node ? Is my understanding correct?

Also, please help me understand this scenario !!

I have a job that today took much longer on the new cluster that it had taken on old – in fact more than yesterday

There are factors that can make it take longer of course, as the inopuyt varies – but I also saw one stage take 7 hours when it normally takes 2 or 3 (for total)

I saw a lot of errors like

ExecutorLostFailure (executor 1480 exited caused by one of the running tasks) Reason: Stale executor after cluster manager re-registered.

I had 438 failures on 869 taks, that is a huge rate, another part has 873 out of 1236

Thanks

Yasmeen

satz · ‎01-23-2019

Hello @yasmin,

Thanks for reaching us !

As per your reply it says that user who has triggered job, his user id is not listed on that particular node ? Is my understanding correct? ==> Yes you're right !

And your query regarding job slowness, we should consider the factors you mentioned along with below messages as well

~~~

ExecutorLostFailure (executor 1480 exited caused by one of the running tasks) Reason: Stale executor after cluster manager re-registered.

I had 438 failures on 869 taks, that is a huge rate, another part has 873 out of 1236

~~~

Here it seems the executors are getting lost and as a results tasks were dying.

Could you please check the yarn logs for the application (by using #yarn logs command in my previous reply) and see if there is any errors in executor logs. This will help us to see if there is any spcific reason for executor failures

Are you running job in spark-client mode or cluster mode?

Thanks,
Satz

Cloudera Community

Support Questions

Query is not running on hadoop-----spark error