About weichiu

weichiu · ‎11-28-2017

@seth The OP's problem does not seem like a Hue problem.

weichiu · ‎11-14-2017

Sorry I am not able to see your uploaded CM figure. From the stacktrace it looks like the NameNode 10.0.0.157 is done. Would you please check?

weichiu · ‎11-14-2017

I am not sure about the CM's warning, but in principal, you should only add an add number of Zookeeper instances, e.g. 3, 5, or even 7. The RetryInvocationHandler warning should be unrelated to the zookeeper issue though. Instead, it's probably that the first namenode is the standby NN. If you manually fail over, I think you wouldn't see the warning again. You might also want to enable command line debug logs with the following command: export HADOOP_ROOT_LOGGER=DEBUG,console

weichiu · ‎10-16-2017

I reproduced the error by intentionally corrupt the _index file. If you meant "restore" by unarchiving the har file with hdfs dfs -cp command, I find it returns the same AIOOBE, so you won't be able to unarchive it. Your best bet is to download the _index file, manually repair it, replace the _index file, and see how it goes. Meanwhile, I filed an Apache jira HADOOP-14950 to handle the AIOOBE better, but it wouldn't help fix your corrupt _index file.

weichiu · ‎10-16-2017

If you still have the source file, try to archive it again and see if it still produces the same error. If so, I'd be interested in knowing what's inside that _index file.

weichiu · ‎10-16-2017

It looks like your har file is mal-formed. Inside the har file there is an index file called _index. The index file is expected to be in the format of <filename> <dir> pair in each line, and the later part of the line seems to get lost.

weichiu · ‎09-19-2017

Hi @BellRizz thanks offering the workaround. What is the version of CDH that you have? From the description, it sounds like a race condition between reader and writer, and I suspect that is caused by HDFS-11056 (Concurrent append and read operations lead to checksum error) or HDFS-11160 (VolumeScanner reports write-in-progress replicas as corrupt incorrectly). While the summary of HDFS-11160 seems to suggest differently, I have seen customers hitting this issue with concurrent reads and writes.

weichiu · ‎06-30-2017

Interesting story. The decomm process would not complete until all blocks have at least 1 good replica on other DNs. (good replica = replicas that are not stale and on a DataNode that is not being decommissioned or already decommissioned) DirectoryScanner in a DataNode scans the entire directory, reconciling inconsistency between in-memory block map and on-disk replica, so it would eventually pick up the added replica, just a matter of time.

weichiu · ‎06-29-2017

Have you tried restart the DN where you copied the blocks to? Also, try force a full block report: hdfs dfsadmin -triggerBlockReport <datanode_host:ipc_port>

weichiu · ‎06-01-2017

The following is the HDFS NameNode configurations that can be updated to alleviate the issue: <property> <name>ha.health-monitor.rpc-timeout.ms</name> <value>45000</value> <description> Timeout for the actual monitorHealth() calls. </description> </property> I suggest bump ha.health-monitor.rpc-timeout.ms from 45000 (milliseconds) to 90000 and see if it helps.

Online	Offline
Last Visited	‎04-05-2023 01:32 PM

Member Since	‎08-16-2016 10:10 AM
Last Visited	‎04-05-2023 01:32 PM
Posts	48
Kudos received	9

Cloudera Community

Re: HDFS to many bad blocks due to checksum errors...

Re: HDFS diskbalancer unexpected permission denied...

Re: Balancing Blocks Between Disks on Datanode

Re: AWS S3 bucket as a primary storage for HDFS

Re: Error moving data into an encryption zone

Re: Exception while invoking getFileInfo of class ...

Re: Exception while invoking getFileInfo of class ...

Re: HDFS command hdfs dfs -ls throws fatal interna...

Re: HDFS command hdfs dfs -ls throws fatal interna...

Re: HDFS command hdfs dfs -ls throws fatal interna...

Re: Query aborted:Failed to open HDFS file

Re: Tell the NameNode where to find a "MISSING" bl...

Re: Tell the NameNode where to find a "MISSING" bl...

Re: NameNoedStanby shutdown by itself when journal...