Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

Query is not running on hadoop-----spark error




while running spark code its throwing the attached error , however in yarn job is showing as succeeded and finished. When i checked spark log the error below thrown.


YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed:


Please clarify me on this.







Expert Contributor

Hello @yasmin


Thanks for posting your query.


From the error message attached I see the Nodemanager node is complaining that the user ID is not present on the node. Please check if the username is present on the node (where AM is running) or if you are using AD/LDAP just make sure the node is able to resolve the particular username 


You can also check more on container logs, by running 


#yarn logs -applicationId <applicationID> -appOwner <username_who_triggered_job> 




hey Satz,

Hope you are fine and doing well, thanks much for your response.


As per your reply it says that user who has triggered job, his user id is not listed on that particular node ? Is my understanding correct?


Also, please help me understand this scenario !!


I have a job that today took much longer on the new cluster that it had taken on old – in fact more than yesterday


There are factors that can make it take longer of course, as the inopuyt varies – but I also saw one stage take 7 hours when it normally takes 2 or 3 (for total)


I saw a lot of errors like


ExecutorLostFailure (executor 1480 exited caused by one of the running tasks) Reason: Stale executor after cluster manager re-registered.


I had 438 failures on 869 taks, that is a huge rate, another part has 873 out of 1236







Expert Contributor

Hello @yasmin,


Thanks for reaching us !


As per your reply it says that user who has triggered job, his user id is not listed on that particular node ? Is my understanding correct? ==> Yes you're right !


And your query regarding job slowness, we should consider the factors you mentioned along with below messages as well



ExecutorLostFailure (executor 1480 exited caused by one of the running tasks) Reason: Stale executor after cluster manager re-registered.


I had 438 failures on 869 taks, that is a huge rate, another part has 873 out of 1236



Here it seems the executors are getting lost and as a results tasks were dying.


Could you please check the yarn logs for the application (by using #yarn logs command in my previous reply) and see if there is any errors in executor logs. This will help us to see if there is any spcific reason for executor failures


Are you running  job in spark-client mode or cluster mode?

