Reply
Explorer
Posts: 19
Registered: ‎05-26-2016

Best practice for handling client timeout settings with HDFS API

Hi,

 

I'm trying to get a firmer grasp of the issue of client connection timeouts when using the HDFS API programmatically. We have an application which intermittently gets a TimeoutException when writing content to file(s) in HDFS, using a set of worker threads.

 

The CDH Admin console shows 'good health' for HDFS, so this appears to be something intermittent.

 

We're currently not setting anything explicit into the Configuration object when obtaining a connection with HDFS.

 

Looking at

https://issues.apache.org/jira/browse/HADOOP-9106

https://issues.apache.org/jira/browse/HADOOP-7397

 

I'm wondering if what we want to look into is here

https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/core-default.xml

 

specifically ipc.client.connect.timeout and the like.

 

Any recommendations on what specifically we might want to set and how to perhaps increase the default settings to avoid these intermittent 'lags' of HDFS?

 

Thanks.