Support Questions

mdh_raghavendra · ‎03-27-2018

Hi,

When dynamic allocation is enabled, most of the times we are facing issues while fetching the blocks

RetryingBlockFetcher: Retrying fetch (1/3) for 1 outstanding blocks after 5000ms

Error RetyringBlockFetcher: Exception while beginning fetch of 1 outstanding blocks (after 1 retries)

java.io.IOException: Failed to connect to <host>:<some port>

Caused by java.net.ConnectException: Connection refused: <host>:<some port>

We are seeing these errors continuously in the executors when we run a big spark jobs. During this time nothing is being processed and after some time these errors are getting disappeared and the processing gets resumed. This is impacting our job SLAs. Can any one help me on this

Venkatchandler · ‎04-18-2021

Hi Cloudera,

Can someone please help with this issue ?

I'm also facing this issue in our production and impacting our SLA.

Atahar · ‎04-18-2021

Hello ,

Can you check and increase the below parameters?

--conf spark.executor.memory=XXX 
increasing number of executors

Also, See below doc for tuning your spark jobs.

https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-2/

Enri · ‎04-23-2021

Try to run the command adding " --deploy-mode cluster "

It should work, this seems to be a bug

https://support.oracle.com/knowledge/Oracle%20Database%20Products/2498643_1.html

Cloudera Community

Support Questions

Spark shuffle is failing with connection exception when dynamic allocation is enabled

Spark on YARN - Executor Resource Allocation Optim...

Spark-Streaming and dynamic allocation

Dynamic allocation on Spark Standalone cluster

Spark job failure after Kerberos is enabled

How to configure CML's Spark Connection

Hive - Understanding concurrent sessions + queue a...

Connecting to HBase in a Kerberos Enabled Cluster

Connect to Spark Thrift server (Kerberos enabled) ...

Impala connection fails when connection from BI to...

Massive errors on spark shuffle and conneciton res...