Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Best practice for handling client timeout settings with HDFS API

Best practice for handling client timeout settings with HDFS API

Explorer

Hi,

 

I'm trying to get a firmer grasp of the issue of client connection timeouts when using the HDFS API programmatically. We have an application which intermittently gets a TimeoutException when writing content to file(s) in HDFS, using a set of worker threads.

 

The CDH Admin console shows 'good health' for HDFS, so this appears to be something intermittent.

 

We're currently not setting anything explicit into the Configuration object when obtaining a connection with HDFS.

 

Looking at

https://issues.apache.org/jira/browse/HADOOP-9106

https://issues.apache.org/jira/browse/HADOOP-7397

 

I'm wondering if what we want to look into is here

https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/core-default.xml

 

specifically ipc.client.connect.timeout and the like.

 

Any recommendations on what specifically we might want to set and how to perhaps increase the default settings to avoid these intermittent 'lags' of HDFS?

 

Thanks.