Created 03-27-2018 10:30 AM
Hi,
When dynamic allocation is enabled, most of the times we are facing issues while fetching the blocks
RetryingBlockFetcher: Retrying fetch (1/3) for 1 outstanding blocks after 5000ms
Error RetyringBlockFetcher: Exception while beginning fetch of 1 outstanding blocks (after 1 retries)
java.io.IOException: Failed to connect to <host>:<some port>
Caused by java.net.ConnectException: Connection refused: <host>:<some port>
We are seeing these errors continuously in the executors when we run a big spark jobs. During this time nothing is being processed and after some time these errors are getting disappeared and the processing gets resumed. This is impacting our job SLAs. Can any one help me on this
Created 04-18-2021 02:32 AM
Hi Cloudera,
Can someone please help with this issue ?
I'm also facing this issue in our production and impacting our SLA.
Created 04-18-2021 11:45 PM
Hello ,
Can you check and increase the below parameters?
--conf spark.executor.memory=XXX increasing number of executors
Also, See below doc for tuning your spark jobs.
https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-2/
Created 04-23-2021 07:17 AM
Try to run the command adding " --deploy-mode cluster "
It should work, this seems to be a bug
https://support.oracle.com/knowledge/Oracle%20Database%20Products/2498643_1.html