Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark shuffle is failing with connection exception when dynamic allocation is enabled

Highlighted

Spark shuffle is failing with connection exception when dynamic allocation is enabled

Hi,

When dynamic allocation is enabled, most of the times we are facing issues while fetching the blocks

RetryingBlockFetcher: Retrying fetch (1/3) for 1 outstanding blocks after 5000ms

Error RetyringBlockFetcher: Exception while beginning fetch of 1 outstanding blocks (after 1 retries)

java.io.IOException: Failed to connect to <host>:<some port>

Caused by java.net.ConnectException: Connection refused: <host>:<some port>

We are seeing these errors continuously in the executors when we run a big spark jobs. During this time nothing is being processed and after some time these errors are getting disappeared and the processing gets resumed. This is impacting our job SLAs. Can any one help me on this

Don't have an account?
Coming from Hortonworks? Activate your account here