Member since
08-11-2015
8
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 49 | 05-22-2026 07:38 AM |
05-22-2026
07:38 AM
Self resolved The switch between direct single threaded copy and distcp depends on file size gt hive.exec.copyfile.maxsize The default value is 32MB
... View more
05-22-2026
03:26 AM
I'm exporting several tables and I observe that some files are copied to the target path using DistCp (slow) while other files with some other (fast) mean There is no evidence of the rational behind the choice but the other odd thing is that even if a table is made up of multiple files, hive starts 1 DistCp for each file instead of passing the entire directory Is there any option to drive the behaviour?
... View more
Labels:
- Labels:
-
Apache Hive
02-23-2026
02:40 AM
On the data node the typical stack trace were these 2026-02-20 12:01:41,486 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Lock held time above threshold: lock identifier: FsDatasetRWLock lockHeldTimeMs=8582 ms. Supp ressed 0 lock warnings. Longest suppressed LockHeldTimeMs=0. The stack trace is: java.lang.Thread.getStackTrace(Thread.java:1559) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:160) org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:220) org.apache.hadoop.util.InstrumentedReadLock.unlock(InstrumentedReadLock.java:78) org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84) org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96) org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1920) org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:376) org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:719) org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:872) java.lang.Thread.run(Thread.java:748) 2026-02-20 12:01:41,486 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Waited above threshold to acquire lock: lock identifier: FsDatasetRWLock waitTimeMs=7442 ms. Suppressed 3 lock wait warnings. Longest suppressed WaitTimeMs=414. The stack trace is: java.lang.Thread.getStackTrace(Thread.java:1559) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) org.apache.hadoop.util.InstrumentedLock.logWaitWarning(InstrumentedLock.java:171) org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:222) org.apache.hadoop.util.InstrumentedLock.lock(InstrumentedLock.java:105) org.apache.hadoop.util.AutoCloseableLock.acquire(AutoCloseableLock.java:67) org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1646) org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:212) org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1303) org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:762) org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:178) org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:112) org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:291) java.lang.Thread.run(Thread.java:748) 2026-02-20 11:06:02,845 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Waited above threshold to acquire lock: lock identifier: FsDatasetRWLock waitTimeMs=688 ms. S uppressed 5 lock wait warnings. Longest suppressed WaitTimeMs=397. The stack trace is: java.lang.Thread.getStackTrace(Thread.java:1559) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) org.apache.hadoop.util.InstrumentedLock.logWaitWarning(InstrumentedLock.java:171) org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:222) org.apache.hadoop.util.InstrumentedLock.lock(InstrumentedLock.java:105) org.apache.hadoop.util.AutoCloseableLock.acquire(AutoCloseableLock.java:67) org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:1750) org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:997) org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:899) org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:178) org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:112) org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:291) java.lang.Thread.run(Thread.java:748) and this 2026-02-20 11:11:44,500 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Waited above threshold to acquire lock: lock identifier: FsDatasetRWLock waitTimeMs=443 ms. Suppressed 1 lock wait warnings. Longest suppressed WaitTimeMs=412. The stack trace is: java.lang.Thread.getStackTrace(Thread.java:1559) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) org.apache.hadoop.util.InstrumentedLock.logWaitWarning(InstrumentedLock.java:171) org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:222) org.apache.hadoop.util.InstrumentedLock.lock(InstrumentedLock.java:105) org.apache.hadoop.util.AutoCloseableLock.acquire(AutoCloseableLock.java:67) org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:115) org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.validateBlockFile(FsDatasetImpl.java:2036) org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReplica(FsDatasetImpl.java:808) org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReplica(FsDatasetImpl.java:801) org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getLength(FsDatasetImpl.java:794) org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkBlock(FsDatasetImpl.java:1988) org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlock(DataNode.java:2315) org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlocks(DataNode.java:2372) org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:726) org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:684) org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1334) org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1380) org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1307) org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1290)
... View more
02-23-2026
02:31 AM
Hi @Asfahan yes, heap should be around 300GB but these is what NN say on webui Heap Memory used 111.53 GB of 169.41 GB Heap Memory. Max Heap Memory is 169.41 GB. For what concerns handlers, dfs_namenode_handler_count is 70 (it should be 80 with 17 datanodes) while dfs_datanode_handler_count is at it's default value of 3 On a different cluster I had this set to 24 this is the stack trace for a write-lock held in active NN 2026-02-20 11:01:44,596 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of suppressed write-lock reports: 0 Longest write-lock held at 1972-02-11 21:18:16,333+0100 for 6157ms via java.lang.Thread.getStackTrace(Thread.java:1559) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:262) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:226) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1696) org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.processBlocksInternal(DatanodeAdminManager.java:703) org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.pruneReliableBlocks(DatanodeAdminManager.java:644) org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:572) org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:506) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748) Total suppressed write-lock held time: 0.0
... View more
02-23-2026
12:49 AM
Hello @Asfahan thank you for the answer, yes, I understand that the cluster it'a little oversized About the topic, I don't find any "Block report queue full" message but several write-lock with long duration but, strange enough, not during hdfs service startup What I find is a number of request coming via NFS Gateway (around 3000/minute) and several GC (Allocation Failure) in gc log in the first 20 minutes of startup and several about a the end when all the datanodes reported thei blocks The NN has 160GB of heap and DN 30GB What I found strange is dfs_datanode_handler_count set to 3, that might be the cause of the original issue that forced me to restart the service In fact, I was decommissioning one node and when I started, suddenly I've experience a huge performance degradation, even if network, hdfs and disk I/O were not so critical (cluster Net I/O peak was 280 MB/s, hdfs I/O 190 MB/s, disk I/O write peak of 300 MB/s)
... View more
02-21-2026
03:01 AM
I'm experiencing very slow HDFS start in CDP 7.1.7SP1 for a cluster with a huge number of blocks (over 300 million, with each server having up to 40 million) I've checked this https://community.cloudera.com/t5/Community-Articles/Scaling-the-HDFS-NameNode-part-5/ta-p/327450 and I wonder if setting dfs.blockreport.split.threshold to 0 might somehow speed up the process I've seen that the setting should go in NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xm Is this setting service wide so that a full restart is necessary?
... View more
Labels:
- Labels:
-
Cloudera
01-09-2026
07:51 AM
Thanks for the suggestion, I will go for distcp because we have hundred thousand of files and "only" several thousand of them must be restored
... View more
01-09-2026
03:59 AM
I have to temporarily change permissions to several files so I'm planning to take a snapshot before issuing chmod command I know that restoring a file from a snapshot is done using cp command In this case what does it happen, is it just restored the inode? What does it happen to unmodified files running cp <snaproot>/.snapshot/<name>/* <target dir>/ Are those file skipped?
... View more
Labels:
- Labels:
-
HDFS