About Noel_0317

Noel_0317 · ‎04-26-2024

Hello, I'm seeing these logs in the namenode, it seems it can't connect to the datanodes? that we have 2024-04-26 14:45:21,470 DEBUG net.NetworkTopology: No node to choose. 2024-04-26 14:45:21,470 DEBUG blockmanagement.BlockPlacementPolicy: Failed to choose from the next rack (location = /rack-10.128.44.130), retry choosing randomly org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:829) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:717) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseFromNextRack(BlockPlacementPolicyDefault.java:660) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:636) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:511) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:414) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:463) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:290) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:143) at org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:46) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1858) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1810) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4643) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4510) at java.lang.Thread.run(Thread.java:748) 2024-04-26 14:45:21,470 DEBUG net.NetworkTopology: Choosing random from 0 available nodes on node /, scope=, excludedScope=null, excludeNodes=[10.128.43.221:9866, 10.128.43.204:9866, 10.128.44.130:9866]. numOfDatanodes=3. But when I executed this command to check the datanode status Live datanodes (3): Name: 10.128.43.204:9866 (10-128-43-204.hdfs-datanode-web.metering.svc.cluster.local) Hostname: hdfs-datanode-0.hdfs-datanode.metering.svc.cluster.local Rack: /rack-10.128.43.204 Decommission Status : Normal Configured Capacity: 1056759873536 (984.18 GB) DFS Used: 60179288064 (56.05 GB) Non DFS Used: 74838016 (71.37 MB) DFS Remaining: 996488970240 (928.05 GB) DFS Used%: 5.69% DFS Remaining%: 94.30% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Fri Apr 26 15:07:59 UTC 2024 Last Block Report: Fri Apr 26 12:21:53 UTC 2024 Num of Blocks: 1777840 Name: 10.128.43.221:9866 (10-128-43-221.hdfs-datanode-web.metering.svc.cluster.local) Hostname: hdfs-datanode-2.hdfs-datanode.metering.svc.cluster.local Rack: /rack-10.128.43.221 Decommission Status : Normal Configured Capacity: 1056759873536 (984.18 GB) DFS Used: 60178792448 (56.05 GB) Non DFS Used: 74838016 (71.37 MB) DFS Remaining: 996489465856 (928.05 GB) DFS Used%: 5.69% DFS Remaining%: 94.30% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Fri Apr 26 15:08:00 UTC 2024 Last Block Report: Fri Apr 26 14:22:23 UTC 2024 Num of Blocks: 1777840 Name: 10.128.44.130:9866 (10-128-44-130.hdfs-datanode-web.metering.svc.cluster.local) Hostname: hdfs-datanode-1.hdfs-datanode.metering.svc.cluster.local Rack: /rack-10.128.44.130 Decommission Status : Normal Configured Capacity: 1056759873536 (984.18 GB) DFS Used: 60182228992 (56.05 GB) Non DFS Used: 74838016 (71.37 MB) DFS Remaining: 996486029312 (928.05 GB) DFS Used%: 5.69% DFS Remaining%: 94.30% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Fri Apr 26 15:08:00 UTC 2024 Last Block Report: Fri Apr 26 13:56:23 UTC 2024 Num of Blocks: 1777840 Hope you can help us to fix this kind of blocker. Thank you in advance!

Noel_0317 · ‎02-25-2024

Hi, it seems the table and partition can't be created, also the files on each datanodes can't be located by the namenode. 1. Is there a way to re-point those files? (non dfs used data to the actual directory) Configured Capacity: 1056759873536 (984.18 GB) DFS Used: 475136 (464 KB) Non DFS Used: 433030918144 (403.29 GB) DFS Remaining: 623711703040 (580.88 GB) DFS Used%: 0.00% DFS Remaining%: 59.02% Datanode directory: bash-4.2$ cd /hadoop/dfs/data bash-4.2$ ls -l total 10485776 drwxrwsr-x. 4 hadoop root 4096 Feb 23 11:15 current -rw-r--r--. 1 hadoop root 58 Feb 26 09:34 in_use.lock -rw-rw-r--. 1 hadoop root 10737418240 Aug 28 05:26 tempfile drwxrwsr-x. 2 hadoop root 4096 Feb 23 13:05 test 2. Next, how can we proceed with creating the tables and partitions? logs of namenode: 2024-02-26 06:52:26,604 DEBUG security.UserGroupInformation: PrivilegedAction as:presto (auth:SIMPLE) from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) 2024-02-26 06:52:26,604 DEBUG hdfs.StateChange: *DIR* NameNode.rename: /tmp/presto-reporting-operator/576b4b93-ae3b-41ff-b401-be50023f776f/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 to /operator_metering/storage/metering_health_check/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 2024-02-26 06:52:26,604 DEBUG security.UserGroupInformation: Failed to get groups for user presto by java.io.IOException: No groups found for user presto 2024-02-26 06:52:26,604 DEBUG hdfs.StateChange: DIR* NameSystem.renameTo: /tmp/presto-reporting-operator/576b4b93-ae3b-41ff-b401-be50023f776f/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 to /operator_metering/storage/metering_health_check/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 2024-02-26 06:52:26,604 DEBUG hdfs.StateChange: DIR* FSDirectory.renameTo: /tmp/presto-reporting-operator/576b4b93-ae3b-41ff-b401-be50023f776f/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 to /operator_metering/storage/metering_health_check/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 2024-02-26 06:52:26,604 WARN hdfs.StateChange: DIR* FSDirectory.unprotectedRenameTo: failed to rename /tmp/presto-reporting-operator/576b4b93-ae3b-41ff-b401-be50023f776f/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 to /operator_metering/storage/metering_health_check/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 because destination's parent does not exist 2024-02-26 06:52:26,604 DEBUG ipc.Server: Served: rename, queueTime= 0 procesingTime= 0 2024-02-26 06:52:26,604 DEBUG ipc.Server: IPC Server handler 5 on 9820: responding to Call#55244 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.rename from 10.128.38.29:59164 2024-02-26 06:52:26,604 DEBUG ipc.Server: IPC Server handler 5 on 9820: responding to Call#55244 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.rename from 10.128.38.29:59164 Wrote 36 bytes. 2024-02-26 06:52:26,607 DEBUG ipc.Server: got #55245 2024-02-26 06:52:26,607 DEBUG ipc.Server: IPC Server handler 6 on 9820: Call#55245 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 10.128.38.29:59164 for RpcKind RPC_PROTOCOL_BUFFER 2024-02-26 06:52:26,607 DEBUG security.UserGroupInformation: PrivilegedAction as:presto (auth:SIMPLE) from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) 2024-02-26 06:52:26,607 DEBUG security.UserGroupInformation: Failed to get groups for user presto by java.io.IOException: No groups found for user presto 2024-02-26 06:52:26,607 DEBUG metrics.TopMetrics: a metric is reported: cmd: getfileinfo user: presto (auth:SIMPLE) 2024-02-26 06:52:26,607 DEBUG top.TopAuditLogger: ------------------- logged event for top service: allowed=true ugi=presto (auth:SIMPLE) ip=/10.128.38.29 cmd=getfileinfo src=/operator_metering/storage/metering_health_check dst=null perm=null 2024-02-26 06:52:26,607 DEBUG ipc.Server: Served: getFileInfo, queueTime= 0 procesingTime= 0 2024-02-26 06:52:26,607 DEBUG ipc.Server: IPC Server handler 6 on 9820: responding to Call#55245 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 10.128.38.29:59164 2024-02-26 06:52:26,607 DEBUG ipc.Server: IPC Server handler 6 on 9820: responding to Call#55245 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 10.128.38.29:59164 Wrote 34 bytes. 2024-02-26 06:52:26,608 DEBUG ipc.Server: got #55246 2024-02-26 06:52:26,608 DEBUG ipc.Server: IPC Server handler 4 on 9820: Call#55246 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 10.128.38.29:59164 for RpcKind RPC_PROTOCOL_BUFFER logs of reporting-operator: time="2024-02-23T14:14:21Z" level=error msg="cannot insert into Presto table operator_health_check" app=metering component=testWriteToPresto error="presto: query failed (200 OK): \"com.facebook.presto.spi.PrestoException: Failed to create directory: hdfs://hdfs-namenode-proxy:9820/tmp/presto-reporting-operator/1d20c5c5-11e0-47b4-9bce-eaa724db21eb\"" whenever we're trying to query, this is the error: Error running query: Partition location does not exist: hdfs://hdfs-namenode-0.hdfs-namenode:9820/user/hive/warehouse/datasource_mlp_gpu_request_slots/dt=2024-02-08 Thank you!

Noel_0317 · ‎02-23-2024

I've copied the CID of the datanode and updated the VERSION file in namenode and overwritten it. datanode and namenode are now running, but it seems it cannot create or process any data/blocks on it. Also is there a way to restore the data/blocks from the datanode to the namenode? since as per checking the size of the directories inside the datanode. it seems the data are still there: /dev/rbd1 985G 404G 581G 41% /hadoop/dfs/data

Noel_0317 · ‎02-23-2024

##UPDATE We've already formatted the namenode using the command hadoop namenode -format and namenode is now running properly, however, our datanodes are giving some errors in the logs: 2024-02-23 09:47:33,138 DEBUG datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to hdfs-namenode-0.hdfs-namenode/10.128.66.8:9820 received versionRequest response: lv=-64;cid=CID-a22214c4-37b5-48e5-a2e0-1ee094bdddd9;nsid=1630192623;c=1708681299098;bpid=BP-367922716-10.128.66.7-1708681299098 2024-02-23 09:47:33,141 INFO datanode.DataNode: Acknowledging ACTIVE Namenode during handshakeBlock pool <registering> (Datanode Uuid unassigned) service to hdfs-namenode-0.hdfs-namenode/10.128.66.8:9820 2024-02-23 09:47:33,142 INFO common.Storage: Using 1 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1) 2024-02-23 09:47:33,156 INFO common.Storage: Lock on /hadoop/dfs/data/in_use.lock acquired by nodename 1@hdfs-datanode-0.hdfs-datanode.metering.svc.cluster.local 2024-02-23 09:47:33,158 WARN common.Storage: Failed to add storage directory [DISK]file:/hadoop/dfs/data java.io.IOException: Incompatible clusterIDs in /hadoop/dfs/data: namenode clusterID = CID-a22214c4-37b5-48e5-a2e0-1ee094bdddd9; datanode clusterID = CID-2ccc1334-69cb-4e3a-954b-5b644c34acbf at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:736) at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:294) ... 2024-02-23 09:47:33,159 ERROR datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid 61ba0590-5f8d-4bb7-a121-d70586b83bdb) service to hdfs-namenode-0.hdfs-namenode/10.128.66.8:9820. Exiting. java.io.IOException: All specified directories have failed to load. at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:552) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1705) ...

Noel_0317 · ‎02-22-2024

++UPDATE We copied the config from /hadoop-config/hdfs-site.xml to /etc/hadoop/hdfs-site.xml to overwrite the directory on the -recover command however, it appears that it is not recuperating as expected, please see the logs below for reference: 2024-02-22 14:54:54,663 ERROR namenode.MetaRecoveryContext: Failed to apply edit log operation TimesOp [length=0, path=/user/hive/warehouse/datasource_storage_data_transfer_transferred_mbytes/dt=20 23-08-04/20230919_173327_13960_n4dzb_af83fb23-0235-4c0a-a58a-edff0e4c866e, mtime=-1, atime=1703052661305, opCode=OP_TIMES, txid=31360513036]: error null Enter 'c' to continue, applying edits Enter 's' to stop reading the edit log here, abandoning any later edits Enter 'g' to quit without saving Enter 'a' to always select the first choice in the future without prompting. (c/s/q/a) automatically choosing c 2024-02-22 14:54:54,663 INFO namenode.MetaRecoveryContext: Continuing 2024-02-22 14:54:54,663 ERROR namenode.FSEditLogLoader: Encountered exception on operation TimesOp [length=0, path=/user/hive/warehouse/datasource_storage_data_transfer_transferred_mbytes/dt=2023-0 8-04/20230919_164531_13295_n4dzb_d955413f-79c3-442f-a33b-82f90f8440fb, mtime=-1, atime=1703052661305, opCode=OP_TIMES, txid=31360513037] java.lang.NullPointerException 2024-02-22 14:54:54,663 ERROR namenode. MetaRecoveryContext: Failed to apply edit log operation TimesOp [length=0, path=/user/hive/warehouse/datasource_storage_data_transfer_transferred_mbytes/dt=20 23-08-04/20230919_164531_13295_n4dzb_d955413f-79c3-442f-a33b-82f90f8440fb, mtime=-1, atime=1703052661305, opCode=OP_TIMES, txid=31360513037]: error null Enter 'c' to continue, applying edits Enter 's' to stop reading the edit log here, abandoning any later edits Enter 'q' to quit without saving Enter 'a' to always select the first choice in the future without prompting. (c/s/q/a) automatically choosing c 2024-02-22 14:54:54,663 INFO namenode.MetaRecoveryContext: Continuing 2024-02-22 14:54:54,663 ERROR namenode. FSEditLogLoader: Encountered exception on operation TimesOp [length=0, path=/user/hive/warehouse/datasource_storage_data_transfer_transferred_mbytes/dt=2023-0 8-04/20230919_163841_13187_n4dzb_86ab366f-b9fa-4799-a13c-cc88959c9618, mtime=-1, atime=1703052661305, opCode=OP_TIMES, txid=31360513038] java.lang.NullPointerException 2024-02-22 14:54:54,663 ERROR namenode.MetaRecoveryContext: Failed to apply edit log operation TimesOp [length=0, path=/user/hive/warehouse/datasource_storage_data_transfer_transferred_mbytes/dt=20 23-08-04/20230919_163841_13187_n4dzb_86ab366f-b9fa-4799-a13c-cc88959c9618, mtime=-1, atime=1703052661305, opCode=OP_TIMES, txid=31360513038]: error null 2024-02-22 08:10:55,082 DEBUG namenode.FSNamesystem: OP_CLOSE: /tmp/presto-reporting-operator/76614aa4-be63-4b87-9dbb-550157e3d6a2/dt=2023-12-19/20231219_074340_06571_n4dzb_3cf5e97e-46e0-459d-9d0c-e91b5495819b numblocks : 1 clientHolder 2024-02-22 08:10:55,084 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedRenameTo: /tmp/presto-reporting-operator/76614aa4-be63-4b87-9dbb-550157e3d6a2/dt=2023-12-19/20231219_074340_06571_n4dzb_3cf5e97e-46e0-459d-9d0c-e91b5495819b is renamed to /user/hive/warehouse/datasource_storage_radosgw_usage_bucket_mbytes/dt=2023-12-19/20231219_074340_06571_n4dzb_3cf5e97e-46e0-459d-9d0c-e91b5495819b 2024-02-22 08:10:55,086 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedDelete: /tmp/presto-reporting-operator/76614aa4-be63-4b87-9dbb-550157e3d6a2/dt=2023-12-19 is removed 2024-02-22 08:10:55,087 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedDelete: /tmp/presto-reporting-operator/76614aa4-be63-4b87-9dbb-550157e3d6a2 is removed 2024-02-22 08:10:55,089 DEBUG namenode. FSDirectory: child: 583efb5a-fa80-4df0-b4ae-f35e687da8d0, posixAclInheritanceEnabled: true, modes: rwxr-xr-x 2024-02-22 08:10:55,089 DEBUG namenode. FSDirectory: child: dt-2023-12-19, posixAclInheritanceEnabled: true, modes: rwxrwxrwx 245 clientMachine 2024-02-22 08:10:55,089 DEBUG namenode. FSNamesystem: OP_ADD: /tmp/presto-reporting-operator/583efb5a-fa80-4df0-b4ae-f35e687da8d0/dt=2023-12-19/20231219_074347_06574_n4dzb_86410619-b540-4b00-b2f7-8a0a6f8ba202 numblocks : 0 clientHolder D FSClient_NONMAPREDUCE_-637714557_210 clientMachine 10.128.33.124 2024-02-22 08:10:55,089 DEBUG namenode.FSDirectory: child: 20231219_074347_06574_n4dzb_86410619-b540-4b00-b2f7-8a0a6f8ba202, posixAclInheritanceEnabled: true, modes: rw-r--r-- 2024-02-22 08:10:55,090 DEBUG namenode. FSNamesystem: OP_ADD_BLOCK: /tmp/presto-reporting-operator/583efb5a-fa80-4df0-b4ae-f35e687da8d0/dt=2023-12-19/20231219_074347_06574_n4dzb_86410619-b540-4b00-b2f7-8a0a6f8ba202 new block id: 1127996 2024-02-22 08:10:55,090 DEBUG namenode. FSNamesystem: OP_CLOSE: /tmp/presto-reporting-operator/583efb5a-fa80-4df0-b4ae-f35e687da8d0/dt=2023-12-19/20231219_074347_06574_n4dzb_86410619-b540-4b00-b2f7-8a0a6f8ba202 numblocks 1 clientHolder 2024-02-22 08:10:55,092 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedRenameTo: /tmp/presto-reporting-operator/583efb5a-fa80-4df0-b4ae-f35e687da8d0/dt=2023-12-19/20231219_074347_06574_n4dzb_86410619-b540-4b00-b2f7-8a0a6f8ba202 is renamed to /user/hive/warehouse/datasource_jnaas_gpu_capacity/dt-2023-12-19/20231219_074347_06574_n4dzb_86410619-b540-4b00-b2f7-8a0a6f8ba202 2024-02-22 08:10:55,096 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedDelete: /tmp/presto-reporting-operator/583efb5a-fa80-4df0-b4ae-f35e687da8d0/dt=2023-12-19 is removed 2024-02-22 08:10:55,096 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedDelete: /tmp/presto-reporting-operator/583efb5a-fa80-4df0-b4ae-f35e687da8d0 is removed 2024-02-22 08:10:55,112 DEBUG namenode. FSDirectory: child: e4dladf3-f00f-40bc-b737-824eeld382eb, posixAclInheritanceEnabled: true, modes: rwxr-xr-x 2024-02-22 08:10:55,112 DEBUG namenode. FSDirectory: child: dt-2023-12-19, posixAclInheritanceEnabled: true, modes: rwxrwxrwx 245 2024-02-22 02:28:14,428 DEBUG namenode.FSNamesystem: OP_ADD: /tmp/presto-reporting-operator/84ecbe02-34cf-4496-9c4e-185c0a594859/dt=2023-12-18/20231218_112548_10016_n4dzb_c11605c5-bf11-4e44-88a4-0ea870e810aa numblocks : 0 clientHolder D FSClient NONMAPREDUCE 669985933 51 clientMachine 10.128.60.244 2024-02-22 02:28:14,428 DEBUG namenode. FSDirectory: child: 20231218_112548_10016_n4dzb_c11605c5-bf11-4e44-88a4-0ea870e810aa, posixAclInheritanceEnabled: true, modes: rw-r--r-- 2024-02-22 02:28:14,428 DEBUG namenode.FSNamesystem: OP_ADD_BLOCK: /tmp/presto-reporting-operator/84ecbe02-34cf-4496-9c4e-185c0a594859/dt=2023-12-18/20231218_112548_10016_n4dzb_c11605c5-bf11-4e44-88a4-0ea870e810aa new block id: 1127963 2024-02-22 02:28:14,428 DEBUG namenode. FSNamesystem: OP_CLOSE: /tmp/presto-reporting-operator/84ecbe02-34cf-4496-9c4e-185c0a594859/dt=2023-12-18/20231218_112548_10016_n4dzb_c11605c5-bf11-4e44-88a4-0ea870e810aa numblocks : 1 clientHolder 2024-02-22 02:28:14,429 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedRename To: /tmp/presto-reporting-operator/84ecbe02-34cf-4496-9c4e-185c0a594859/dt=2023-12-18/20231218_112548_10016_n4dzb_c11605c5-bf11-4e44-88a4-0ea870e810aa is renamed to /user/hive/warehouse/datasource_storage_shared_filesystem_usage_volume_files/dt=2023-12-18/20231218_112548_10016_n4dzb_c11605c5-bf11-4e44-88a4-0ea870e810aa 2024-02-22 02:28:14,431 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedDelete: /tmp/presto-reporting-operator/84ecbe02-34cf-4496-9c4e-185c0a594859/dt=2023-12-18 is removed 2024-02-22 02:28:14,431 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedDelete: /tmp/presto-reporting-operator/84ecbe02-34cf-4496-9c4e-185c0a594859 is removed 2024-02-22 02:28:14,449 DEBUG namenode. FSDirectory: child: e8edf3c5-11c8-4d11-93ff-dbaab7709549, posixAclInheritanceEnabled: true, modes: rwxr-xr-x

Noel_0317 · ‎02-22-2024

Hi, thank you for your suggestion, but unfortunately, we don't have a standby namenode 😞

Noel_0317 · ‎02-19-2024

2024-02-16 01:50:21,843 ERROR namenode.NameNode: Failed to start namenode. java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 35923060031 but unable to find any edit logs containing txid 20264370028 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1751) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1709) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1684) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:701) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:937) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:910) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) 2024-02-16 01:50:21,845 DEBUG util.ExitUtil: Exiting with status 1: java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 35923060031 but unable to find any edit logs containing txid 20264370028 1: java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 35923060031 but unable to find any edit logs containing txid 20264370028 at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:265) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1716) Caused by: java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 35923060031 but unable to find any edit logs containing txid 20264370028 It seems the 20264370028 is already missing, is there a way to skip this file or recover it? Note: We've already executed the hadoop namenode -recover but it is pointing to a different location. 2024-02-16 07:12:34,287 INFO hdfs.StateChange: STATE* Safe mode is ON. It was turned on manually. Use "hdfs dfsadmin -safemode leave" to turn safe mode off. 2024-02-16 07:12:34,287 WARN common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name does not exist 2024-02-16 07:12:34,289 WARN namenode.FSNamesystem: Encountered exception loading fsimage org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:376) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:227) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) at org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1548) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1630) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) 2024-02-16 07:12:34,291 DEBUG namenode.FSEditLog: Closing log when already closed 2024-02-16 07:12:34,292 INFO namenode.MetaRecoveryContext: RECOVERY FAILED: caught exception org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:376) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:227) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) at org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1548) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1630) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) 2024-02-16 07:12:34,292 ERROR namenode.NameNode: Failed to start namenode. Thank you!

Noel_0317 · ‎10-04-2023

Is it safe to delete the files in /hadoop/dfs/name/current ? Since we're getting 100% for this directory in hdfs-namenode bash-4.2$ df -h Filesystem Size Used Avail Use% Mounted on /dev/rbd1 935G 935G 100M 100% /hadoop/dfs/name bash-4.2$ pwd /hadoop/dfs/name/current bash-4.2$ cd .. bash-4.2$ du -sh * 935G current 4.0K in_use.lock Also, the earliest data available in this folder was on July 20

Noel_0317 · ‎09-21-2023

Hi @Lorenzo , I've added the total bytes of the result of this command, and it didn't reach 705 GB. May I know if there are other ways to calculate to get the 705 GB? Thank you!

Noel_0317 · ‎09-17-2023

Hello, We want to breakdown the files/data for these datanodes. May we know if this is the correct command? or path? to check how the data came up to 705GB. hdfs dfs -du -v /operator_metering/storage

Online	Offline
Last Visited	‎10-03-2024 08:21 AM

Member Since	‎03-30-2023 04:45 AM
Last Visited	‎10-03-2024 08:21 AM
Posts	16
Kudos received	4

Cloudera Community

Namenode | No node to choose | Failed to choose fr...

Re: Failed to start namenode | unable to find any ...

Re: Failed to start namenode | unable to find any ...

Re: Failed to start namenode | unable to find any ...

Re: Failed to start namenode | unable to find any ...

Re: Failed to start namenode | unable to find any ...

Failed to start namenode | unable to find any edit...

Namenode fs utilization

Re: hdfs-datanode size

hdfs-datanode size