Created 12-16-2016 11:56 AM
Hi folks,
I am saving a hive query to a file - usually the output is 000000_0. I left a coordinator running which performed the task several times. At some point it crashed because the file was written to 000000_1000. Why is that? Of course, the next action looking for the file is killed because it doesn't exists.
I cannot explain the behaivour. Does it coincide with a map reduce problem?
How can I prevent it from happening?
Thanks for your help!
Ken
Created 12-16-2016 12:12 PM
These files are each parts of the parallel processing output on hadoop. E.g if it starts with part_m it results from a mapper and part_r it results from a reducer. Each file part is given a _serialNumber. You can process the results by referencing the path of the parent directory which will union all file parts into one data set.
I am curious if you happened to crash when it hit _1000 or whether there is a configuration for the number of file parts that can be created. @all Anyone want to weigh in on that?
Created 12-16-2016 02:02 PM
how are you running Hive? ORC? Tez? what data? table DDL?
Created 12-16-2016 02:03 PM
http://stackoverflow.com/questions/8536066/hive-create-table-filename-000000-0
Are there any error messages?
Are you getting the results you want?
Created 12-17-2016 09:21 PM
Hi and thanks for your responses. The action runs smoothly and is NOT killed.
The query is also a simple Hive select which should not need a huge amount of mappers and reducers.
What's your opinion on this? Any guidance for what i can look out in the logfiles?
Thanks!
Created 12-18-2016 04:43 PM
Post your log files (yarn, hive) for the job and some other details (size of data, file format, size of cluster) and we will all figure this out together.
Created 12-19-2016 07:11 AM
Thanks Greg. I will try to reproduce the error and post additional information here!
Created 12-19-2016 01:43 PM
Could it be associated with the parameter: hive.exec.max.dynamic.partitions ??
Created 12-19-2016 08:18 PM
Could be ... the default value is 1000. Taking a look at the logs would be a step closer to proving this.
Created 12-22-2016 11:38 AM
Hi Greg, I peeked into the logs of three jobs regarding this problem, namley the oozie launcher, the hive job itself and 'insert overwrite directory'. I searched for errors as well as the value 1000 and came up with following lines. Does it help to find the problem somehow?? Or any suggestion what I can search for?
Thanks and kind regards,
Ken
Oozie Launcher for Hive Job
Oozie Launcher ends 60715 [main] -5p org.apache.hadoop.mapred.Task - Task:attempt_1481914266056_2012_m_000000_1000 is done. And is in the process of committing 60758 [main] -5p org.apache.hadoop.mapred.Task - Task attempt_1481914266056_2012_m_000000_1000 is allowed to commit now 60764 [main] -5p org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_1481914266056_2012_m_000000_1000' to hdfs://xxxxxx 60809 [main] -5p org.apache.hadoop.mapred.Task - Task 'attempt_1481914266056_2012_m_000000_1000' done. Hive configuration Properties ----------------------------- hive.txn.max.open.batch=1000 dfs.namenode.inotify.max.events.per.rpc=1000 mapreduce.reduce.merge.inmem.threshold=1000 mapreduce.client.progressmonitor.pollinterval=1000 mapreduce.jobtracker.retiredjobs.cache.size=1000 hive.compactor.abortedtxn.threshold=1000
Hive Job
- no special occurrences of 1000 and errors -
Inser Overwrite Directory Job
Logtype:syslog 2016-12-19 13:18:00,109 INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1481914266056_2014_m_000002_1000 is done. And is in the process of committing 2016-12-19 13:18:00,198 INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1481914266056_2014_m_000002_1000' done. 2016-12-19 13:17:50,619 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1481914266056_2014_m_000000_1000 TaskAttempt Transitioned from NEW to UNASSIGNED 2016-12-19 13:17:50,620 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1481914266056_2014_m_000001_1000 TaskAttempt Transitioned from NEW to UNASSIGNED 2016-12-19 13:17:50,620 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1481914266056_2014_m_000002_1000 TaskAttempt Transitioned from NEW to UNASSIGNED 2016-12-19 13:17:50,620 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1481914266056_2014_r_000000_1000 TaskAttempt Transitioned from NEW to UNASSIGNED 2016-12-19 13:18:11,184 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://XXX/.hive-staging_hive_2016-12-19_13-17-41_137_2262959442612945763-1/_tmp.-ext-10002/000000_1000 2016-12-19 13:18:11,184 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://XXX/.hive-staging_hive_2016-12-19_13-17-41_137_2262959442612945763-1/_task_tmp.-ext-10002/_tmp.000000_1000 2016-12-19 13:18:11,184 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://XXX/.hive-staging_hive_2016-12-19_13-17-41_137_2262959442612945763-1/_tmp.-ext-10002/000000_1000 2016-12-19 13:18:11,194 WARN [main] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:375) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:595) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:397) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:762) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:758) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:757) at org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1618) at org.apache.hadoop.ipc.Client.call(Client.java:1449) at org.apache.hadoop.ipc.Client.call(Client.java:1396) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:816) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176) at com.sun.proxy.$Proxy15.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2158) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1423) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1419) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1419) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1443) at org.apache.hadoop.fs.FileSystem.deleteOnExit(FileSystem.java:1398) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:619) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:570) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:679) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) 2016-12-19 13:18:11,296 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 1 2016-12-19 13:18:11,297 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 4 2016-12-19 13:18:11,446 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: RECORDS_OUT_1:4, 2016-12-19 13:18:11,447 INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1481914266056_2014_r_000000_1000 is done. And is in the process of committing 2016-12-19 13:18:11,520 INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1481914266056_2014_r_000000_1000' done. End of LogType:syslog LogType:syslog.shuffle Log Upload Time:Mon Dec 19 13:18:18 +0100 2016 LogLength:6454 Log Contents: 2016-12-19 13:18:10,616 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: MergerManager: memoryLimit=12180048896, maxSingleShuffleLimit=3045012224, mergeThreshold=8038832640, ioSortFactor=100, memToMemMergeOutputsThreshold=100 2016-12-19 13:18:10,618 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1481914266056_2014_r_000000_1000 Thread started: EventFetcher for fetching Map Completion Events 2016-12-19 13:18:10,624 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Ignoring obsolete output of FAILED map-task: 'attempt_1481914266056_2014_m_000001_1000' 2016-12-19 13:18:10,624 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Ignoring obsolete output of FAILED map-task: 'attempt_1481914266056_2014_m_000000_1000' 2016-12-19 13:18:10,625 INFO [fetcher#7] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning XXX with 1 to fetcher#7 2016-12-19 13:18:10,626 INFO [fetcher#7] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of 1 to CCC to fetcher#7 2016-12-19 13:18:10,626 INFO [fetcher#11] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning XXX with 1 to fetcher#11 2016-12-19 13:18:10,626 INFO [fetcher#11] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of 1 to XXX to fetcher#11 2016-12-19 13:18:10,626 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1481914266056_2014_r_000000_1000: Got 3 new map-outputs 2016-12-19 13:18:10,626 INFO [fetcher#13] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning XXX with 1 to fetcher#13 2016-12-19 13:18:10,627 INFO [fetcher#13] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of 1 to XXX to fetcher#13 2016-12-19 13:18:10,714 INFO [fetcher#13] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=13562/mapOutput?job=job_1481914266056_2014&reduce=0&map=attempt_1481914266056_2014_m_000001_1001 sent hash and received reply 2016-12-19 13:18:10,714 INFO [fetcher#11] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=13562/mapOutput?job=job_1481914266056_2014&reduce=0&map=attempt_1481914266056_2014_m_000000_1001 sent hash and received reply 2016-12-19 13:18:10,714 INFO [fetcher#7] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=13562/mapOutput?job=job_1481914266056_2014&reduce=0&map=attempt_1481914266056_2014_m_000002_1000 sent hash and received reply 2016-12-19 13:18:10,718 INFO [fetcher#7] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#7 about to shuffle output of map attempt_1481914266056_2014_m_000002_1000 decomp: 150 len: 154 to MEMORY 2016-12-19 13:18:10,718 INFO [fetcher#13] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#13 about to shuffle output of map attempt_1481914266056_2014_m_000001_1001 decomp: 597 len: 601 to MEMORY 2016-12-19 13:18:10,718 INFO [fetcher#11] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#11 about to shuffle output of map attempt_1481914266056_2014_m_000000_1001 decomp: 1185 len: 1189 to MEMORY 2016-12-19 13:18:10,720 INFO [fetcher#7] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 150 bytes from map-output for attempt_1481914266056_2014_m_000002_1000 2016-12-19 13:18:10,720 INFO [fetcher#13] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 597 bytes from map-output for attempt_1481914266056_2014_m_000001_1001 2016-12-19 13:18:10,720 INFO [fetcher#11] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 1185 bytes from map-output for attempt_1481914266056_2014_m_000000_1001 2016-12-19 13:18:10,721 INFO [fetcher#7] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 150, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->1932 2016-12-19 13:18:10,722 INFO [fetcher#11] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 1185, inMemoryMapOutputs.size() -> 2, commitMemory -> 150, usedMemory ->1932 2016-12-19 13:18:10,722 INFO [fetcher#7] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: XXX freed by fetcher#7 in 97ms 2016-12-19 13:18:10,722 INFO [fetcher#13] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 597, inMemoryMapOutputs.size() -> 3, commitMemory -> 1335, usedMemory ->1932 2016-12-19 13:18:10,723 INFO [fetcher#13] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: XXX freed by fetcher#13 in 96ms 2016-12-19 13:18:10,723 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: EventFetcher is interrupted.. Returning 2016-12-19 13:18:10,723 INFO [fetcher#11] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: XXX freed by fetcher#11 in 98ms 2016-12-19 13:18:10,729 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: finalMerge called with 3 in-memory map-outputs and 0 on-disk map-outputs 2016-12-19 13:18:10,734 INFO [main] org.apache.hadoop.mapred.Merger: Merging 3 sorted segments 2016-12-19 13:18:10,734 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 594 bytes 2016-12-19 13:18:10,739 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merged 3 segments, 1932 bytes to disk to satisfy reduce memory limit 2016-12-19 13:18:10,739 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 1 files, 1932 bytes from disk 2016-12-19 13:18:10,740 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce 2016-12-19 13:18:10,740 INFO [main] org.apache.hadoop.mapred.Merger: Merging 1 sorted segments 2016-12-19 13:18:10,742 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1337 bytes End of LogType:syslog.shuffle