Support Questions

Find answers, ask questions, and share your expertise

Spark SQL Job stcuk indefinitely at last task of a stage -- Shows INFO: BlockManagerInfo : Removed broadcast in memory

avatar
Explorer

Hi,

I am working on HDP 2.4.2 ( hadoop 2.7, hive 1.2.1 , JDK 1.8, scala 2.10.5 ) . My Spark/Scala job reads hive table ( using Spark-SQL) into DataFrames ,performs few Left joins and insert the final results into a Hive Table which is partitioned. The source tables having apprx 50millions of records. Spark creates 74 stages for this job. It executes 72 stages successfully but hangs at 499th task of 73rd stage, and not able to execute the final stage no 74. I can see many message on console i:e "INFO: BlockManagerInfo : Removed broadcast in memory" .

...it doesn't show any error/exception...even after 1 hours it doesn't come out and only way is to Kill the job.

I have total 15 nodes with 40Gb RAM with 6 cores in each node. I am using spark-submit in yarn client mode . Scheduling is configured as FIFO and my job is consuming 79% of resources.

Can anybody advise on this. whats could be the issue?

Regards

Praveen Khare

10 REPLIES 10

avatar

Have you been able to solve this issue.   Getting exact same issue.  Wanted to know how you resolve it   Thanks