Support Questions

r4ndompuff · ‎10-30-2022

Hello!

I`m newbie in spark and all cloud-data workflow, but I have a problem on my new job where I need to work with PySpark and Hadoop.

In my spark-history some applications are "incompleted" for week now. I've tried to kill them, close sparkContext(), kill main .py process, but nothing helped.

For example,

yarn application -status <id>

shows:

...
State: FINISHED
Final-State: SUCCEDED
...
Log Aggregation Status: TIME_OUT
...

But in Spark-History I still see it in incomplete section of my applications. If I open this application there, I can see 1 Active job with 1 Alive executor, but they are doing nothing for all week. This seems like a logging bug, but as I know this problem is only with me, other coworkers doesn't have this problem.

This thread doesn't helped me, because I dont have access to start-history-server.sh.

I suppose this is because of

Log Aggregation Status: TIME_OUT

because my "completed" applications have

Log Aggregation Status: SUCCEDED

What can I do to fix this? Right now I have 80+ incompleted applications in spark-history...

Sorry, for my bad English 😞

r4ndompuff · ‎10-30-2022

UPD: I've found a clear description of my problem with same situation (yarn, spark, etc.), but there is no solution: https://stackoverflow.com/questions/52126052/what-is-active-jobs-in-spark-history-server-spark-ui-jo...

AsimShaikh · ‎10-30-2022

Hello @r4ndompuff

Are you able to fetch logs for this application from command line?

yarn logs -applicationId <app_id> -appOwner <user>

Possibly, when there are huge number of application count stored that is expected to cause this issue. In general, large /tmp/logs (yarn.nodemanager.remote-app-log-dir) HDFS directory causes YARN log aggregation to time out.

Regarding killing application, this must be code level issue you need to check if sc.close() method has been called at correct place.

Thanks!

r4ndompuff · ‎10-31-2022

Hello, @AsimShaikh!

Thank you very much for your answer!
No, this command is not working for me, I can see only the error that my account don`t have access to the server with logs...

But I've found a root of my problem:

From Spark Monitoring and Instrumentation:

... 3. Applications which exited without registering themselves as completed will be listed as incomplete --even though they are no longer running. This can happen if an application crashes...

I am really restarting kernel in JH quite often, because we have unstable system right now (we are moving from office to another).
Can I just mark incomplete applications as complete somehow or I need to write to somebody who have access to spark logs folder?

AsimShaikh · ‎10-31-2022

You may need to explicitly stop the SparkContext sc by calling sc.stop.

it's a good idea to call sc.stop(), which lets the spark master know that your application is finished consuming resources. If you don't call sc.stop(), the event log information that is used by the history server will be incomplete, and your application will not show up in the history server's UI.

r4ndompuff · ‎10-31-2022

My pc in office was rebooted many times, I don't have opened session with initial SparkContext.
I've tried to create one more and call sc.stop(), but this is not helped 😞

AsimShaikh · ‎11-01-2022

You have sample code which you can share?

Cloudera Community

Support Questions

Spark application in incomplete section of spark-history even when complited.