Support Questions

cjervis · ‎04-16-2018

Hello and good morning,

we have a problem with the submit of Spark Jobs.

The last two tasks are not processed and the system is blocked. It only helps to quit the application.

Executor ID

Address

Status

RDD Blocks

Storage Memory

Disk Used

Cores

Active Tasks

Failed Tasks

Complete Tasks

Total Tasks

Task Time (GC Time)

Input

Shuffle Read

Shuffle Write

Logs

Thread Dump

driver

192.168.1.134:37341

Active

2

48.2 KB / 4.4 GB

0.0 B

0

0 ms (0 ms)

0.0 B

stdout

Thread Dump

stderr

1

worker_12:43376

Active

2

48.2 KB / 9 GB

0.0 B

2

0

20

22

3 s (58 ms)

920 KB

0.0 B

152 B

stdout

Thread Dump

stderr

In the thread dump we have found the following

In the thread dump I could find the following inconsistency. It seems that the thread with the ID 63 is waiting for the one with the ID 71. Can you see why the thread can't finish its work?

Thread ID	Thread Name	Thread State	Thread Locks
71	Executor task launch worker for task 502	RUNNABLE	Lock(java.util.concurrent.ThreadPoolExecutor$Worker@1010216063}), Monitor(java.lang.UNIXProcess$ProcessPipeInputStream@752994465}), Monitor(org.apache.spark.api.python.PythonWorkerFactory@143199750}), Monitor(org.apache.spark.SparkEnv@687340918})
java.io.FileInputStream.readBytes(Native Method)
java.io.FileInputStream.read(FileInputStream.java:255)
java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
java.io.BufferedInputStream.read(BufferedInputStream.java:265) => holding Monitor(java.lang.UNIXProcess$ProcessPipeInputStream@752994465})
java.io.DataInputStream.readInt(DataInputStream.java:387)
org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:181) => holding Monitor(org.apache.spark.api.python.PythonWorkerFactory@143199750})
org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:103) => holding Monitor(org.apache.spark.api.python.PythonWorkerFactory@143199750})
org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:79)
org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:118) => holding Monitor(org.apache.spark.SparkEnv@687340918})
org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128)
org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
org.apache.spark.scheduler.Task.run(Task.scala:108)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
63	Executor task launch worker for task 524	BLOCKED	Blocked by Thread Some(71) Lock(org.apache.spark.SparkEnv@687340918})
Lock(java.util.concurrent.ThreadPoolExecutor$Worker@2128788466})
org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:116)
org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128)
org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
org.apache.spark.scheduler.Task.run(Task.scala:108)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

The spark-003.txt contains the last ~200 lines of the job log.

If any further log / dump etc. needed I will try to provide and post it.

Very thanks ahead for the support.

Best regards René

adnanalvee · ‎04-20-2018

Could be a data skew issue. Checkout if any partition has huge chunk of the data compared to the rest.

https://github.com/adnanalvee/spark-assist/blob/master/spark-assist.scala

From the link above, copy the function "partitionStats" and pass in your data as a dataframe.

It will show the maximum, minimum and average amount of data across your partitions like below.

    +------+-----+------------------+
    |MAX   |MIN  |AVERAGE           |
    +------+-----+------------------+
    |135695|87694|100338.61149653122|
    +------+-----+------------------+

Cloudera Community

Support Questions

Spark executor blocks on last task

Spark on YARN - Executor Resource Allocation Optim...

How to : capture Spark Driver and Executor Logs in...

Spark 3 legacy configurations list ( Spark 2 behav...

Spark Python Supportability Matrix

Explaining "block missing" and "block corruption" ...

Spark cluster: Launched executors less than specif...

log4j.properties override spark executor

Spark and Java versions Supportability Matrix

Spark Memory Management

Spark Exception: Task Not Serializable