About kc7bfi

kc7bfi · ‎11-03-2017

I have a Java application that is appending recorded video into an HDFS file. Occasionally, after writing a batch of video frames, when I try to close the FSDataOutputStream I get the following error: Unable to close file because the last block does not have enough number of replicas In this case, I sleep for 100ms and try again and the close succeeds. However, the next time I try and open the file I get the following error: Failed to APPEND_FILE /PSG/20171102.idx for DFSClient_NONMAPREDUCE_1265824578_479 on 192.168.3.224 because DFSClient_NONMAPREDUCE_1265824578_479 is already the current lease holder. What is the proper way of handling a failed close attempt? Any ideas on how to handle such situations? Thanks, David

kc7bfi · ‎10-09-2017

Yes, actually it's 20s. I believe that it is the default from the ipc.client.connect.timeout default. I am trying to see if I can set it to 2s. The main problem seems to be that for every FileSystem object I create it wants to always try my server 1 first which is down. It doesn't seem to remember that it was down the last time it tried and continues to keep retrying it for each new FileSystem object. I am also trying caching my own FileSystem object so that, if I reuse an object that has already failed over to the second server, I won't incur the same 2s delay in first trying to connect to the failed server.

kc7bfi · ‎10-06-2017

dfs.client.retry.policy.enabled is set to false

kc7bfi · ‎10-06-2017

I have a cluster with two namenodes configured in a for HA. For failover testing, we purposely turned off namenode 1. However, when trying to check an HDFS file size from server 2, the HDFS client call still attempts to connect to namenode 1 first. This causes a 20s delay while it times out before it tries namenode2. I've tried setting the dfs.ha.namenodes.xxx property to change the search order but without success. It always trys namenode1 first and then, after 20s goes to namenode 2. This is causing unacceptable delays in our system which needs faster response times than having to wait 20s to connect to the correct namenode. Does anyone how I may rectify this problem? Thanks, David

Online	Offline
Last Visited	‎09-24-2020 02:06 PM

Member Since	‎04-26-2016 01:00 PM
Last Visited	‎09-24-2020 02:06 PM
Posts	19
Kudos received	6

Cloudera Community

How to handle: Unable to close file because the la...

Re: hdfs client takes 20s to failover to alternate...

Re: hdfs client takes 20s to failover to alternate...

hdfs client takes 20s to failover to alternate nam...