Support Questions

ishashrestha · ‎09-25-2025

Error: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for user/myname/.cm/distcp-staging/2025-09-21-05-14-47-c4f9/intermediate.1 at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:447) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:152) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:133) at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:3566) at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:3360) at org.apache.hadoop.io.SequenceFile$Sorter.mergePass(SequenceFile.java:3336) at org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:2899) at org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:2938) at com.cloudera.enterprise.distcp.util.DistCpUtils.sortListing(DistCpUtils.java:427) at com.cloudera.enterprise.distcp.mapred.StatusReducer.lambda$deleteMissing$1(StatusReducer.java:152) at com.cloudera.enterprise.distcp.mapred.StatusReducerProgress.track(StatusReducerProgress.java:211) at com.cloudera.enterprise.distcp.mapred.StatusReducerProgress.trackSortSourceListing(StatusReducerProgress.java:223) at com.cloudera.enterprise.distcp.mapred.StatusReducer.deleteMissing(StatusReducer.java:151) at com.cloudera.enterprise.distcp.mapred.StatusReducer.cleanup(StatusReducer.java:89) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

This is the error I am getting at first I suspected if i don't have enough space in my local dir but all the dir have enough space, should not be permission issue as it only fail sometimes most of the time replication is successful but this issue persist atleast once a week. Can someone help where should i look next.

DianaTorres · ‎09-25-2025

Hi @cravani @james_jones @ggangadharan Do you have some insights here? Thanks!

Regards,

Diana Torres,
Senior Community Moderator

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

Shmoo · ‎09-26-2025

Hi @ishashrestha ,

Since the issue just happen intermittently is most likely that one of the workers nodes have a local disk issue.

Because the job will be executed as a MapReduce job, and Yarn creates their containers with scratch dirs locally, probably one of the nodes have this problem.

Please check the yarn.nodemanager.local-dirs and mapreduce.cluster.local.dir to know the location of the scratch-dirs, then confirm if each worker node have enough disk space or the correct permissions.

Let me know if this is the case.

Best Regards

ishashrestha · ‎09-26-2025

@Shmoo I initially thought the issue might be related to space in the local directory, but after checking, I noticed it sometimes fails and sometimes passes in the same node, even when the file sizes are similar and there's enough space in the dir. Is there anything else I might be missing that I should check?

Shmoo · ‎09-26-2025

Hi @ishashrestha ,

Well, another thing is that the path user/myname/.cm/distcp-staging/... suggests this DistCp job was initiated or managed by Cloudera Manager (CM).

This process uses a separate staging directory, but it still relies on the NodeManager's local directories for intermediate sorting, which is where the error is occurring (SequenceFile$Sorter.sort).

Confirm that the user myname is correctly mapped and has the necessary permissions across the cluster. While you dismissed permissions, an intermittent Kerberos ticket issue or a transient user mapping problem on one specific node could cause this weekly failure.

The next steps should focus on reviewing the NodeManager health and logs for the specific nodes that failed, checking the status of the local scratch directories on those nodes, and coordinating the failure time with any scheduled system maintenance.

Best Regards

ishashrestha · ‎09-26-2025

@ShmooThank you for the details. I’ll review the mentioned points and check accordingly.

DianaTorres · ‎10-03-2025

@ishashrestha Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.

Regards,

Diana Torres,
Senior Community Moderator

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

ishashrestha · ‎10-13-2025

@Shmoo The user that I am using for the replication should be able to write in the local directory in this case right? I tried logging in from the myname user and tried to create file in my local dir of nm but it said permission denied.

Shmoo · ‎10-13-2025

Hi @ishashrestha ,

Yes, the user needs to have permission to write to that directory. You can test to execute with another user that already have that permissions.

Let me know if works.

Best Regards

Support Questions

Hive Replication failes with Error: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory