Created 11-18-2016 02:41 PM
We are trying to run a Cascading job using Tez flow connector. Job is failing with the following error. but the same job run fine with mapreduce flow connector. After looking through Tez Source code LocalDirAllocator is initialized with key tez.runtime.framework.local.dirs. Not sure where an when this property is set or it has any impact to resolve the issue. any idea to revolve the issue is appreciated.
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/attempt_1479221177536_0001_1_01_000000_2_10015_0/file.out at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:402) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) at org.apache.tez.runtime.library.common.task.local.output.TezTaskOutputFiles.getSpillFileForWrite(TezTaskOutputFiles.java:207) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.spill(PipelinedSorter.java:447) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:231) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:327) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:283) at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:164) at cascading.flow.tez.stream.element.OldOutputCollector.collect(OldOutputCollector.java:57) at cascading.tap.hadoop.util.MeasuredOutputCollector.collect(MeasuredOutputCollector.java:70) at cascading.flow.tez.stream.element.TezGroupGate.wrapGroupingAndCollect(TezGroupGate.java:125) at cascading.flow.hadoop.stream.HadoopGroupGate.receive(HadoopGroupGate.java:109)
Created 12-02-2016 08:51 AM
Typically tez.runtime.framework.local.dirs isn't used by user, and it's only used internally by Tez. Its value comes from YARN local dirs configuration through environment variable for containers. There are several ways for LocalDirAllocator to find no valid local dir: 1. disk/dir is full; 2. disk/dir is not writable; 3. local dirs settings is not correct. Since mapreduce based jobs work fine for you, I assume there is not issue about disk being full or unwritable. You should check whether env var LOCAL_DIRS is correct by looking at container launching script.
Created 05-07-2017 04:01 PM
I faced the same problem when trying to run hive query on both hive.execution.engine=mr or hive.execution.engine=tez.
The error looks like:
Vertex failed, vertexName=Map 1, vertexId=vertex_1494168504267_0002_2_00, diagnostics=[Task failed, taskId=task_1494168504267_0002_2_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1494168504267_0002_2_00_000000_0:org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/attempt_1494168504267_0002_2_00_000000_0_10002_0/file.out at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:402) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) at org.apache.tez.runtime.library.common.task.local.output.TezTaskOutputFiles.getSpillFileForWrite(TezTaskOutputFiles.java:207) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.spill(PipelinedSorter.java:544)
The problem was solved by setting the following parameters:
In file hadoop/conf/core-site.xml parameter hadoop.tmp.dir
In file hadoop/conf/tez-site.xml parameter tez.runtime.framework.local.dirs
In file hadoop/conf/yarn-site.xml parameter yarn.nodemanager.local-dirs
In file hadoop/conf/mapred-site.xml parameter mapreduce.cluster.local.dir
Set a valid directory with sufficient free space available and the query will execute.
Created 05-03-2019 03:51 PM
i am also getting this error.we are processing 500 gb data and nodemanager local-dir size is 100 gb.at the time of job execution it was 91 % full.1100 mappers and 1000 reducers are there .mapper job completed ,some reducer job failed and killed.Please help me to solve that.
error will be as:
Status: Failed
Vertex re-running, vertexName=Map 1, vertexId=vertex_1556753562511_0049_1_00
Vertex failed, vertexName=Reducer 2, vertexId=vertex_1556753562511_0049_1_01, diagnostics=[Task failed, taskId=task_1556753562511_0049_1_01_000470, diagnostics=[TaskAttempt 0 failed, info=[Error: exceptionThrown=org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in DiskToDiskMerger [Map_1]
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:357)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:334)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1556753562511_0049_1_01_000470_0_10014_src_1187_spill_-1
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:441)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager$OnDiskMerger.merge(MergeManager.java:841)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeThread.run(MergeThread.java:89)
, errorMessage=Shuffle Runner Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in DiskToDiskMerger [Map_1]
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:357)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:334)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)