About blizano

blizano · ‎04-20-2023

Hello @Eren Mabe this kb can be useful https://community.cloudera.com/t5/Customer/HDFS-write-occasionally-fails-with-error-message-quot-Unable/ta-p/81420 The block chain replication for HDFS does not seem to be able to complete: snapshot export: 2023-04-19 19:44:10,199 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* blk_-9223372036854751104_9943 is COMMITTED but not COMPLETE(numNodes= 5 >= minimum = 3) in file /hbase/archive/data/default/test/09748ed90d0d58c0fe7ac4b3c08f3cd4/cf/e35fadd88d244766800728318ccea508 2023-04-19 19:44:10,200 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 8020, call Call#31 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 172.31.0.146:52210 …. hadoop distcp: java.io.IOException: Unable to close file because the last block does not have enough number of replicas. Then make sure that the destination cluster has enough number of datanodes available and with space based on the replication factor or the hbase file, in this case /hbase/archive/data/default/test/09748ed90d0d58c0fe7ac4b3c08f3cd4/cf/e35fadd88d244766800728318ccea508 You can reconfirm the RF of the file by running any of the following commands: hdfs fsck /hbase/archive/data/default/test/09748ed90d0d58c0fe7ac4b3c08f3cd4/cf/e35fadd88d244766800728318ccea508 hdfs dfs -ls /hbase/archive/data/default/test/09748ed90d0d58c0fe7ac4b3c08f3cd4/cf/e35fadd88d244766800728318ccea508. In the last one, the second column will tell the RF applied to the file. You can also check in the destination cluster if the datanodes are distributed in racks, if so, check that each rack has enough number of datanodes available and with space. Hope this helps.

blizano · ‎03-14-2023

Hello @Vako You may require further investigation from the AWS support team concerning this: [ERROR] Deferred error: s3:c96 close("/data/test/datasync//test.txt"): 5 (Input/output error) This error comes from S3 protocol, also this message from NN logs: java.io.IOException: File /data/test/datasync/test.txt could only be written to 0 of the 1 minReplication nodes. There are 6 datanode(s) running and 6 node(s) are excluded in this operation. This means that the client reached the Namenode, so it wrote the file at metadata level in HDFS, but when the application tried to write the data to the different datanodes this fails for some reason, maybe you can check if Datanode's logs tell you something at the time of the issue, but I've always seen this issue caused by some problem/misconfiguration in the application side, also you can validate if there are no firewall rules blocking the datanode port for the server where you are running this AWS DataSync agent. Hope this helps.

blizano · ‎03-02-2023

Hello, org.apache.hadoop.hbase.YouAreDeadException, normally occurs if Region Server lost communication or last too much reporting availability with the znode cause different reasons [1] You may want to check what is in Region Server logs and check if the zookeeper service is not crashing and if ZK timeouts are properly set [2] Hope this helps [1] https://issues.apache.org/jira/browse/HBASE-25274 [2] https://community.cloudera.com/t5/Customer/What-is-the-formula-to-calculate-ZooKeeper-timeouts-for/ta-p/271310

blizano · ‎12-05-2022

Hello @hanumanth , If Zookeeper services look up and running, you may need to compare the Spark job failure timestamp against Zookeeper logs from the Leader sever. if there is not a visible issue from Zookeeper side you can check if the hbase client configurations were applied properly in the spark job configurations. Also, confirm that the Hbase service is up and functional as well. If the above does not help, you may want to raise a support ticket with the Spark component.

blizano · ‎11-14-2022

@hanumanth you may check if the files you deleted from HDFS still exist somehow in HDFS and if so, check the replication factor applied to them, this can be contained in the second column of the hdfs -ls output, for this you can collect a recursive listing by running hdfs dfs -ls -R / > hdfs_recursive After you can try a filter to this file to know the RF factor applied to your files: hdfs dfs -ls -R / | awk '{print $2}' | sort | uniq -c Also, ensure that there is no other content from other processes (something different than HDFS) filling up the mount points used to store the HDFS blocks. You can get also the following outputs to compare if existing data is protected by snapshots or not: #1 du -s -x #2 du -s ---> Du command usage [1] If the results of the above commands differ between them probably there are snapshots still present that are avoiding blocks to be deleted from datanodes.

blizano · ‎11-07-2022

What is the current HDP, CDH, or CDP version of this cluster? Is this issue present on every browser? Did it work as expected before?

Online	Offline
Last Visited	‎12-25-2024 03:29 PM

Member Since	‎08-08-2020 06:27 PM
Last Visited	‎12-25-2024 03:29 PM
Posts	68

Cloudera Community

Re: Data directories mount point space not reclaim...

Re: how can i copy hbase data from replication set...

Re: Issue on migration: AWS DataSync to HDFS

Re: HDP3.1.4 Inconsistent HBase region status resu...

Re: Hbase - Zookeeper connection issue

Re: Data directories mount point space not reclaim...

Re: I can open localhost:50070 successfully, but I...