Created 10-30-2022 10:17 AM
Hello!
I`m newbie in spark and all cloud-data workflow, but I have a problem on my new job where I need to work with PySpark and Hadoop.
In my spark-history some applications are "incompleted" for week now. I've tried to kill them, close sparkContext(), kill main .py process, but nothing helped.
For example,
yarn application -status <id>
shows:
...
State: FINISHED
Final-State: SUCCEDED
...
Log Aggregation Status: TIME_OUT
...
But in Spark-History I still see it in incomplete section of my applications. If I open this application there, I can see 1 Active job with 1 Alive executor, but they are doing nothing for all week. This seems like a logging bug, but as I know this problem is only with me, other coworkers doesn't have this problem.
This thread doesn't helped me, because I dont have access to start-history-server.sh.
I suppose this is because of
Log Aggregation Status: TIME_OUT
because my "completed" applications have
Log Aggregation Status: SUCCEDED
What can I do to fix this? Right now I have 80+ incompleted applications in spark-history...
Sorry, for my bad English 😞
Created on 10-30-2022 03:29 PM - edited 10-30-2022 03:29 PM
UPD: I've found a clear description of my problem with same situation (yarn, spark, etc.), but there is no solution: https://stackoverflow.com/questions/52126052/what-is-active-jobs-in-spark-history-server-spark-ui-jo...
Created 10-30-2022 11:58 PM
Hello @r4ndompuff
Are you able to fetch logs for this application from command line?
yarn logs -applicationId <app_id> -appOwner <user>
Possibly, when there are huge number of application count stored that is expected to cause this issue. In general, large /tmp/logs (yarn.nodemanager.remote-app-log-dir) HDFS directory causes YARN log aggregation to time out.
Regarding killing application, this must be code level issue you need to check if sc.close() method has been called at correct place.
Thanks!
Created 10-31-2022 01:14 AM
Hello, @AsimShaikh!
Thank you very much for your answer!
No, this command is not working for me, I can see only the error that my account don`t have access to the server with logs...
But I've found a root of my problem:
From Spark Monitoring and Instrumentation:
... 3. Applications which exited without registering themselves as completed will be listed as incomplete --even though they are no longer running. This can happen if an application crashes...
I am really restarting kernel in JH quite often, because we have unstable system right now (we are moving from office to another).
Can I just mark incomplete applications as complete somehow or I need to write to somebody who have access to spark logs folder?
Created 10-31-2022 03:20 AM
You may need to explicitly stop the SparkContext sc by calling sc.stop.
it's a good idea to call sc.stop(), which lets the spark master know that your application is finished consuming resources. If you don't call sc.stop(), the event log information that is used by the history server will be incomplete, and your application will not show up in the history server's UI.
Created 10-31-2022 05:03 AM
My pc in office was rebooted many times, I don't have opened session with initial SparkContext.
I've tried to create one more and call sc.stop(), but this is not helped 😞
Created 11-01-2022 05:03 AM
You have sample code which you can share?