Created on 09-05-2018 10:13 PM - edited 08-18-2019 01:54 AM
Executor
is a distributed agent that is responsible for executing tasks.
this is very clear
but how to know if there are any issues with the executors that runs from datanode machine?
I asking this question because when I am looking on the datanode machine
I not see any logs that represented the executors , and I not understand how to trace problems about the exectores
the second important quastion:
heartbeat are sent from the executor to the driver
what are the logs that represented this heartbeat ?
how to know if there are any issue with heartbeat sending ?
Created 09-07-2018 12:49 PM
In yarn master mode executors will run inside a yarn container.
Spark will launch an Application Master that will be responsible of negotiating the containers with Yarn. Having that said only nodes running Nodemanager are eligible to run executors.
First question: The executor logs you are looking for will be part of the yarn application logs for the container running on the specific node. (yarn logs -applicationId <appId>)
Second question: Executor will notify in case heartbeat fails to reach driver for some network problem/timeout. So this should be in the executor log that is part of the application logs.
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 09-07-2018 12:49 PM
In yarn master mode executors will run inside a yarn container.
Spark will launch an Application Master that will be responsible of negotiating the containers with Yarn. Having that said only nodes running Nodemanager are eligible to run executors.
First question: The executor logs you are looking for will be part of the yarn application logs for the container running on the specific node. (yarn logs -applicationId <appId>)
Second question: Executor will notify in case heartbeat fails to reach driver for some network problem/timeout. So this should be in the executor log that is part of the application logs.
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.