Member since
03-30-2023
16
Posts
4
Kudos Received
0
Solutions
04-26-2024
08:11 AM
1 Kudo
Hello, I'm seeing these logs in the namenode, it seems it can't connect to the datanodes? that we have 2024-04-26 14:45:21,470 DEBUG net.NetworkTopology: No node to choose.
2024-04-26 14:45:21,470 DEBUG blockmanagement.BlockPlacementPolicy: Failed to choose from the next rack (location = /rack-10.128.44.130), retry choosing randomly
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException:
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:829)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:717)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseFromNextRack(BlockPlacementPolicyDefault.java:660)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:636)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:511)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:414)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:463)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:290)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:143)
at org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:46)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1858)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1810)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4643)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4510)
at java.lang.Thread.run(Thread.java:748)
2024-04-26 14:45:21,470 DEBUG net.NetworkTopology: Choosing random from 0 available nodes on node /, scope=, excludedScope=null, excludeNodes=[10.128.43.221:9866, 10.128.43.204:9866, 10.128.44.130:9866]. numOfDatanodes=3. But when I executed this command to check the datanode status Live datanodes (3):
Name: 10.128.43.204:9866 (10-128-43-204.hdfs-datanode-web.metering.svc.cluster.local)
Hostname: hdfs-datanode-0.hdfs-datanode.metering.svc.cluster.local
Rack: /rack-10.128.43.204
Decommission Status : Normal
Configured Capacity: 1056759873536 (984.18 GB)
DFS Used: 60179288064 (56.05 GB)
Non DFS Used: 74838016 (71.37 MB)
DFS Remaining: 996488970240 (928.05 GB)
DFS Used%: 5.69%
DFS Remaining%: 94.30%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Apr 26 15:07:59 UTC 2024
Last Block Report: Fri Apr 26 12:21:53 UTC 2024
Num of Blocks: 1777840
Name: 10.128.43.221:9866 (10-128-43-221.hdfs-datanode-web.metering.svc.cluster.local)
Hostname: hdfs-datanode-2.hdfs-datanode.metering.svc.cluster.local
Rack: /rack-10.128.43.221
Decommission Status : Normal
Configured Capacity: 1056759873536 (984.18 GB)
DFS Used: 60178792448 (56.05 GB)
Non DFS Used: 74838016 (71.37 MB)
DFS Remaining: 996489465856 (928.05 GB)
DFS Used%: 5.69%
DFS Remaining%: 94.30%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Apr 26 15:08:00 UTC 2024
Last Block Report: Fri Apr 26 14:22:23 UTC 2024
Num of Blocks: 1777840
Name: 10.128.44.130:9866 (10-128-44-130.hdfs-datanode-web.metering.svc.cluster.local)
Hostname: hdfs-datanode-1.hdfs-datanode.metering.svc.cluster.local
Rack: /rack-10.128.44.130
Decommission Status : Normal
Configured Capacity: 1056759873536 (984.18 GB)
DFS Used: 60182228992 (56.05 GB)
Non DFS Used: 74838016 (71.37 MB)
DFS Remaining: 996486029312 (928.05 GB)
DFS Used%: 5.69%
DFS Remaining%: 94.30%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Apr 26 15:08:00 UTC 2024
Last Block Report: Fri Apr 26 13:56:23 UTC 2024
Num of Blocks: 1777840 Hope you can help us to fix this kind of blocker. Thank you in advance!
... View more
Labels:
- Labels:
-
HDFS
02-25-2024
05:38 PM
1 Kudo
Hi, it seems the table and partition can't be created, also the files on each datanodes can't be located by the namenode. 1. Is there a way to re-point those files? (non dfs used data to the actual directory) Configured Capacity: 1056759873536 (984.18 GB)
DFS Used: 475136 (464 KB)
Non DFS Used: 433030918144 (403.29 GB)
DFS Remaining: 623711703040 (580.88 GB)
DFS Used%: 0.00%
DFS Remaining%: 59.02% Datanode directory: bash-4.2$ cd /hadoop/dfs/data
bash-4.2$ ls -l
total 10485776
drwxrwsr-x. 4 hadoop root 4096 Feb 23 11:15 current
-rw-r--r--. 1 hadoop root 58 Feb 26 09:34 in_use.lock
-rw-rw-r--. 1 hadoop root 10737418240 Aug 28 05:26 tempfile
drwxrwsr-x. 2 hadoop root 4096 Feb 23 13:05 test 2. Next, how can we proceed with creating the tables and partitions? logs of namenode: 2024-02-26 06:52:26,604 DEBUG security.UserGroupInformation: PrivilegedAction as:presto (auth:SIMPLE) from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
2024-02-26 06:52:26,604 DEBUG hdfs.StateChange: *DIR* NameNode.rename: /tmp/presto-reporting-operator/576b4b93-ae3b-41ff-b401-be50023f776f/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 to /operator_metering/storage/metering_health_check/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237
2024-02-26 06:52:26,604 DEBUG security.UserGroupInformation: Failed to get groups for user presto by java.io.IOException: No groups found for user presto
2024-02-26 06:52:26,604 DEBUG hdfs.StateChange: DIR* NameSystem.renameTo: /tmp/presto-reporting-operator/576b4b93-ae3b-41ff-b401-be50023f776f/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 to /operator_metering/storage/metering_health_check/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237
2024-02-26 06:52:26,604 DEBUG hdfs.StateChange: DIR* FSDirectory.renameTo: /tmp/presto-reporting-operator/576b4b93-ae3b-41ff-b401-be50023f776f/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 to /operator_metering/storage/metering_health_check/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237
2024-02-26 06:52:26,604 WARN hdfs.StateChange: DIR* FSDirectory.unprotectedRenameTo: failed to rename /tmp/presto-reporting-operator/576b4b93-ae3b-41ff-b401-be50023f776f/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 to /operator_metering/storage/metering_health_check/20240226_065226_04624_gjgwp_25ca095b-e61e-45a9-b4e3-d12a880a2237 because destination's parent does not exist
2024-02-26 06:52:26,604 DEBUG ipc.Server: Served: rename, queueTime= 0 procesingTime= 0
2024-02-26 06:52:26,604 DEBUG ipc.Server: IPC Server handler 5 on 9820: responding to Call#55244 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.rename from 10.128.38.29:59164
2024-02-26 06:52:26,604 DEBUG ipc.Server: IPC Server handler 5 on 9820: responding to Call#55244 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.rename from 10.128.38.29:59164 Wrote 36 bytes.
2024-02-26 06:52:26,607 DEBUG ipc.Server: got #55245
2024-02-26 06:52:26,607 DEBUG ipc.Server: IPC Server handler 6 on 9820: Call#55245 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 10.128.38.29:59164 for RpcKind RPC_PROTOCOL_BUFFER
2024-02-26 06:52:26,607 DEBUG security.UserGroupInformation: PrivilegedAction as:presto (auth:SIMPLE) from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
2024-02-26 06:52:26,607 DEBUG security.UserGroupInformation: Failed to get groups for user presto by java.io.IOException: No groups found for user presto
2024-02-26 06:52:26,607 DEBUG metrics.TopMetrics: a metric is reported: cmd: getfileinfo user: presto (auth:SIMPLE)
2024-02-26 06:52:26,607 DEBUG top.TopAuditLogger: ------------------- logged event for top service: allowed=true ugi=presto (auth:SIMPLE) ip=/10.128.38.29 cmd=getfileinfo src=/operator_metering/storage/metering_health_check dst=null perm=null
2024-02-26 06:52:26,607 DEBUG ipc.Server: Served: getFileInfo, queueTime= 0 procesingTime= 0
2024-02-26 06:52:26,607 DEBUG ipc.Server: IPC Server handler 6 on 9820: responding to Call#55245 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 10.128.38.29:59164
2024-02-26 06:52:26,607 DEBUG ipc.Server: IPC Server handler 6 on 9820: responding to Call#55245 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 10.128.38.29:59164 Wrote 34 bytes.
2024-02-26 06:52:26,608 DEBUG ipc.Server: got #55246
2024-02-26 06:52:26,608 DEBUG ipc.Server: IPC Server handler 4 on 9820: Call#55246 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 10.128.38.29:59164 for RpcKind RPC_PROTOCOL_BUFFER logs of reporting-operator: time="2024-02-23T14:14:21Z" level=error msg="cannot insert into Presto table operator_health_check" app=metering component=testWriteToPresto error="presto: query failed (200 OK): \"com.facebook.presto.spi.PrestoException: Failed to create directory: hdfs://hdfs-namenode-proxy:9820/tmp/presto-reporting-operator/1d20c5c5-11e0-47b4-9bce-eaa724db21eb\"" whenever we're trying to query, this is the error: Error running query: Partition location does not exist: hdfs://hdfs-namenode-0.hdfs-namenode:9820/user/hive/warehouse/datasource_mlp_gpu_request_slots/dt=2024-02-08 Thank you!
... View more
02-23-2024
05:12 AM
1 Kudo
I've copied the CID of the datanode and updated the VERSION file in namenode and overwritten it. datanode and namenode are now running, but it seems it cannot create or process any data/blocks on it. Also is there a way to restore the data/blocks from the datanode to the namenode? since as per checking the size of the directories inside the datanode. it seems the data are still there: /dev/rbd1 985G 404G 581G 41% /hadoop/dfs/data
... View more
02-23-2024
02:05 AM
1 Kudo
##UPDATE We've already formatted the namenode using the command hadoop namenode -format and namenode is now running properly, however, our datanodes are giving some errors in the logs: 2024-02-23 09:47:33,138 DEBUG datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to hdfs-namenode-0.hdfs-namenode/10.128.66.8:9820 received versionRequest response: lv=-64;cid=CID-a22214c4-37b5-48e5-a2e0-1ee094bdddd9;nsid=1630192623;c=1708681299098;bpid=BP-367922716-10.128.66.7-1708681299098
2024-02-23 09:47:33,141 INFO datanode.DataNode: Acknowledging ACTIVE Namenode during handshakeBlock pool <registering> (Datanode Uuid unassigned) service to hdfs-namenode-0.hdfs-namenode/10.128.66.8:9820
2024-02-23 09:47:33,142 INFO common.Storage: Using 1 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
2024-02-23 09:47:33,156 INFO common.Storage: Lock on /hadoop/dfs/data/in_use.lock acquired by nodename 1@hdfs-datanode-0.hdfs-datanode.metering.svc.cluster.local
2024-02-23 09:47:33,158 WARN common.Storage: Failed to add storage directory [DISK]file:/hadoop/dfs/data
java.io.IOException: Incompatible clusterIDs in /hadoop/dfs/data: namenode clusterID = CID-a22214c4-37b5-48e5-a2e0-1ee094bdddd9; datanode clusterID = CID-2ccc1334-69cb-4e3a-954b-5b644c34acbf
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:736)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:294)
...
2024-02-23 09:47:33,159 ERROR datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid 61ba0590-5f8d-4bb7-a121-d70586b83bdb) service to hdfs-namenode-0.hdfs-namenode/10.128.66.8:9820. Exiting.
java.io.IOException: All specified directories have failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:552)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1705)
...
... View more
02-22-2024
05:48 PM
++UPDATE We copied the config from /hadoop-config/hdfs-site.xml to /etc/hadoop/hdfs-site.xml to overwrite the directory on the -recover command however, it appears that it is not recuperating as expected, please see the logs below for reference: 2024-02-22 14:54:54,663 ERROR namenode.MetaRecoveryContext: Failed to apply edit log operation TimesOp [length=0, path=/user/hive/warehouse/datasource_storage_data_transfer_transferred_mbytes/dt=20 23-08-04/20230919_173327_13960_n4dzb_af83fb23-0235-4c0a-a58a-edff0e4c866e, mtime=-1, atime=1703052661305, opCode=OP_TIMES, txid=31360513036]: error null
Enter 'c' to continue, applying edits
Enter 's' to stop reading the edit log here, abandoning any later edits
Enter 'g' to quit without saving
Enter 'a' to always select the first choice in the future without prompting. (c/s/q/a)
automatically choosing c
2024-02-22 14:54:54,663 INFO namenode.MetaRecoveryContext: Continuing
2024-02-22 14:54:54,663 ERROR namenode.FSEditLogLoader: Encountered exception on operation TimesOp [length=0, path=/user/hive/warehouse/datasource_storage_data_transfer_transferred_mbytes/dt=2023-0 8-04/20230919_164531_13295_n4dzb_d955413f-79c3-442f-a33b-82f90f8440fb, mtime=-1, atime=1703052661305, opCode=OP_TIMES, txid=31360513037] java.lang.NullPointerException
2024-02-22 14:54:54,663 ERROR namenode. MetaRecoveryContext: Failed to apply edit log operation TimesOp [length=0, path=/user/hive/warehouse/datasource_storage_data_transfer_transferred_mbytes/dt=20 23-08-04/20230919_164531_13295_n4dzb_d955413f-79c3-442f-a33b-82f90f8440fb, mtime=-1, atime=1703052661305, opCode=OP_TIMES, txid=31360513037]: error null
Enter 'c' to continue, applying edits
Enter 's' to stop reading the edit log here, abandoning any later edits
Enter 'q' to quit without saving
Enter 'a' to always select the first choice in the future without prompting. (c/s/q/a)
automatically choosing c
2024-02-22 14:54:54,663 INFO namenode.MetaRecoveryContext: Continuing
2024-02-22 14:54:54,663 ERROR namenode. FSEditLogLoader: Encountered exception on operation TimesOp [length=0, path=/user/hive/warehouse/datasource_storage_data_transfer_transferred_mbytes/dt=2023-0 8-04/20230919_163841_13187_n4dzb_86ab366f-b9fa-4799-a13c-cc88959c9618, mtime=-1, atime=1703052661305, opCode=OP_TIMES, txid=31360513038] java.lang.NullPointerException
2024-02-22 14:54:54,663 ERROR namenode.MetaRecoveryContext: Failed to apply edit log operation TimesOp [length=0, path=/user/hive/warehouse/datasource_storage_data_transfer_transferred_mbytes/dt=20 23-08-04/20230919_163841_13187_n4dzb_86ab366f-b9fa-4799-a13c-cc88959c9618, mtime=-1, atime=1703052661305, opCode=OP_TIMES, txid=31360513038]: error null 2024-02-22 08:10:55,082 DEBUG namenode.FSNamesystem: OP_CLOSE: /tmp/presto-reporting-operator/76614aa4-be63-4b87-9dbb-550157e3d6a2/dt=2023-12-19/20231219_074340_06571_n4dzb_3cf5e97e-46e0-459d-9d0c-e91b5495819b numblocks : 1 clientHolder 2024-02-22 08:10:55,084 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedRenameTo: /tmp/presto-reporting-operator/76614aa4-be63-4b87-9dbb-550157e3d6a2/dt=2023-12-19/20231219_074340_06571_n4dzb_3cf5e97e-46e0-459d-9d0c-e91b5495819b is renamed to /user/hive/warehouse/datasource_storage_radosgw_usage_bucket_mbytes/dt=2023-12-19/20231219_074340_06571_n4dzb_3cf5e97e-46e0-459d-9d0c-e91b5495819b 2024-02-22 08:10:55,086 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedDelete: /tmp/presto-reporting-operator/76614aa4-be63-4b87-9dbb-550157e3d6a2/dt=2023-12-19 is removed 2024-02-22 08:10:55,087 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedDelete: /tmp/presto-reporting-operator/76614aa4-be63-4b87-9dbb-550157e3d6a2 is removed 2024-02-22 08:10:55,089 DEBUG namenode. FSDirectory: child: 583efb5a-fa80-4df0-b4ae-f35e687da8d0, posixAclInheritanceEnabled: true, modes: rwxr-xr-x 2024-02-22 08:10:55,089 DEBUG namenode. FSDirectory: child: dt-2023-12-19, posixAclInheritanceEnabled: true, modes: rwxrwxrwx
245
clientMachine
2024-02-22 08:10:55,089 DEBUG namenode. FSNamesystem: OP_ADD: /tmp/presto-reporting-operator/583efb5a-fa80-4df0-b4ae-f35e687da8d0/dt=2023-12-19/20231219_074347_06574_n4dzb_86410619-b540-4b00-b2f7-8a0a6f8ba202 numblocks : 0 clientHolder D FSClient_NONMAPREDUCE_-637714557_210 clientMachine 10.128.33.124 2024-02-22 08:10:55,089 DEBUG namenode.FSDirectory: child: 20231219_074347_06574_n4dzb_86410619-b540-4b00-b2f7-8a0a6f8ba202, posixAclInheritanceEnabled: true, modes: rw-r--r-- 2024-02-22 08:10:55,090 DEBUG namenode. FSNamesystem: OP_ADD_BLOCK: /tmp/presto-reporting-operator/583efb5a-fa80-4df0-b4ae-f35e687da8d0/dt=2023-12-19/20231219_074347_06574_n4dzb_86410619-b540-4b00-b2f7-8a0a6f8ba202 new block id: 1127996 2024-02-22 08:10:55,090 DEBUG namenode. FSNamesystem: OP_CLOSE: /tmp/presto-reporting-operator/583efb5a-fa80-4df0-b4ae-f35e687da8d0/dt=2023-12-19/20231219_074347_06574_n4dzb_86410619-b540-4b00-b2f7-8a0a6f8ba202 numblocks 1 clientHolder 2024-02-22 08:10:55,092 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedRenameTo: /tmp/presto-reporting-operator/583efb5a-fa80-4df0-b4ae-f35e687da8d0/dt=2023-12-19/20231219_074347_06574_n4dzb_86410619-b540-4b00-b2f7-8a0a6f8ba202 is renamed to /user/hive/warehouse/datasource_jnaas_gpu_capacity/dt-2023-12-19/20231219_074347_06574_n4dzb_86410619-b540-4b00-b2f7-8a0a6f8ba202 2024-02-22 08:10:55,096 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedDelete: /tmp/presto-reporting-operator/583efb5a-fa80-4df0-b4ae-f35e687da8d0/dt=2023-12-19 is removed 2024-02-22 08:10:55,096 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedDelete: /tmp/presto-reporting-operator/583efb5a-fa80-4df0-b4ae-f35e687da8d0 is removed 2024-02-22 08:10:55,112 DEBUG namenode. FSDirectory: child: e4dladf3-f00f-40bc-b737-824eeld382eb, posixAclInheritanceEnabled: true, modes: rwxr-xr-x 2024-02-22 08:10:55,112 DEBUG namenode. FSDirectory: child: dt-2023-12-19, posixAclInheritanceEnabled: true, modes: rwxrwxrwx
245 2024-02-22 02:28:14,428 DEBUG namenode.FSNamesystem: OP_ADD: /tmp/presto-reporting-operator/84ecbe02-34cf-4496-9c4e-185c0a594859/dt=2023-12-18/20231218_112548_10016_n4dzb_c11605c5-bf11-4e44-88a4-0ea870e810aa numblocks : 0 clientHolder D FSClient NONMAPREDUCE 669985933 51 clientMachine 10.128.60.244 2024-02-22 02:28:14,428 DEBUG namenode. FSDirectory: child: 20231218_112548_10016_n4dzb_c11605c5-bf11-4e44-88a4-0ea870e810aa, posixAclInheritanceEnabled: true, modes: rw-r--r-- 2024-02-22 02:28:14,428 DEBUG namenode.FSNamesystem: OP_ADD_BLOCK: /tmp/presto-reporting-operator/84ecbe02-34cf-4496-9c4e-185c0a594859/dt=2023-12-18/20231218_112548_10016_n4dzb_c11605c5-bf11-4e44-88a4-0ea870e810aa new block id: 1127963 2024-02-22 02:28:14,428 DEBUG namenode. FSNamesystem: OP_CLOSE: /tmp/presto-reporting-operator/84ecbe02-34cf-4496-9c4e-185c0a594859/dt=2023-12-18/20231218_112548_10016_n4dzb_c11605c5-bf11-4e44-88a4-0ea870e810aa numblocks : 1 clientHolder 2024-02-22 02:28:14,429 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedRename To: /tmp/presto-reporting-operator/84ecbe02-34cf-4496-9c4e-185c0a594859/dt=2023-12-18/20231218_112548_10016_n4dzb_c11605c5-bf11-4e44-88a4-0ea870e810aa is renamed to /user/hive/warehouse/datasource_storage_shared_filesystem_usage_volume_files/dt=2023-12-18/20231218_112548_10016_n4dzb_c11605c5-bf11-4e44-88a4-0ea870e810aa 2024-02-22 02:28:14,431 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedDelete: /tmp/presto-reporting-operator/84ecbe02-34cf-4496-9c4e-185c0a594859/dt=2023-12-18 is removed 2024-02-22 02:28:14,431 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedDelete: /tmp/presto-reporting-operator/84ecbe02-34cf-4496-9c4e-185c0a594859 is removed 2024-02-22 02:28:14,449 DEBUG namenode. FSDirectory: child: e8edf3c5-11c8-4d11-93ff-dbaab7709549, posixAclInheritanceEnabled: true, modes: rwxr-xr-x
... View more
02-22-2024
05:44 PM
Hi, thank you for your suggestion, but unfortunately, we don't have a standby namenode 😞
... View more
02-19-2024
05:34 PM
2024-02-16 01:50:21,843 ERROR namenode.NameNode: Failed to start namenode.
java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 35923060031 but unable to find any edit logs containing txid 20264370028
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1751)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1709)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1684)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:701)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:937)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:910)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
2024-02-16 01:50:21,845 DEBUG util.ExitUtil: Exiting with status 1: java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 35923060031 but unable to find any edit logs containing txid 20264370028
1: java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 35923060031 but unable to find any edit logs containing txid 20264370028
at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:265)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1716)
Caused by: java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 35923060031 but unable to find any edit logs containing txid 20264370028 It seems the 20264370028 is already missing, is there a way to skip this file or recover it? Note: We've already executed the hadoop namenode -recover but it is pointing to a different location. 2024-02-16 07:12:34,287 INFO hdfs.StateChange: STATE* Safe mode is ON.
It was turned on manually. Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
2024-02-16 07:12:34,287 WARN common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name does not exist
2024-02-16 07:12:34,289 WARN namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:376)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:227)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
at org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1548)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1630)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
2024-02-16 07:12:34,291 DEBUG namenode.FSEditLog: Closing log when already closed
2024-02-16 07:12:34,292 INFO namenode.MetaRecoveryContext: RECOVERY FAILED: caught exception
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:376)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:227)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
at org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1548)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1630)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
2024-02-16 07:12:34,292 ERROR namenode.NameNode: Failed to start namenode. Thank you!
... View more
Labels:
- Labels:
-
Apache Hive
10-04-2023
09:45 PM
Is it safe to delete the files in /hadoop/dfs/name/current ? Since we're getting 100% for this directory in hdfs-namenode bash-4.2$ df -h Filesystem Size Used Avail Use% Mounted on /dev/rbd1 935G 935G 100M 100% /hadoop/dfs/name bash-4.2$ pwd /hadoop/dfs/name/current bash-4.2$ cd .. bash-4.2$ du -sh * 935G current 4.0K in_use.lock Also, the earliest data available in this folder was on July 20
... View more
Labels:
- Labels:
-
Apache Hive
09-21-2023
12:26 AM
Hi @Lorenzo , I've added the total bytes of the result of this command, and it didn't reach 705 GB. May I know if there are other ways to calculate to get the 705 GB? Thank you!
... View more
09-17-2023
11:15 PM
Hello,
We want to breakdown the files/data for these datanodes.
May we know if this is the correct command? or path? to check how the data came up to 705GB.
hdfs dfs -du -v /operator_metering/storage
... View more
Labels:
- Labels:
-
Apache Hive
-
HDFS