Member since
05-16-2016
270
Posts
18
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
393 | 07-23-2016 11:36 AM | |
710 | 07-23-2016 11:35 AM | |
425 | 06-05-2016 10:41 AM | |
288 | 06-05-2016 10:37 AM |
05-04-2018
04:55 AM
Anyone here to help out with the issue?
... View more
05-02-2018
05:32 AM
Hello, I see these errors in health history from past 2 days. I restart cluster and it turns out fine after that but it's problematic since our jobs get killed until it's restarted, I am using CDH though. Can I get some pointer on this, please? Memory and CPU configurations are fine though. May 1 5:16:29 AM Agent Status Good Show May 1 5:16:04 AM Frame Errors Good 1 Still Bad Show May 1 5:15:39 AM 6 Became Good 1 Became Disabled 1 Still Bad Show May 1 4:51 AM Frame Errors Unknown 1 Still Bad Show May 1 4:49 AM Swapping Unknown 1 Still Bad Show May 1 4:37:51 AM Network Interface Speed Unknown 1 Still Bad Show May 1 4:37:46 AM 1 Became Bad 1 Became Unknown Show
... View more
- Tags:
- Ambari
Labels:
04-30-2018
11:51 AM
All hadoop nodes in my cluster have java 1.6 version.Zeppling seems to require java 1.7 minimum. What should be done? Is there a way to get it working using java 1.6?
... View more
Labels:
04-30-2018
08:59 AM
That's right . Makes sense to do that but your answer does not address the issue I have. I would like to move SNN to a different mode.
... View more
04-30-2018
07:29 AM
Currently, in our production cluster, Namenode and Secondary Namenode are on the same host (master node). I believe it is advisable to have SNN on a different node. What steps can I follow to safely move SNN from one host to another? There is documentation online to move Namenode but I could not find one that explains a way to reliably move SNN to a new node.
... View more
- Tags:
- Hadoop Core
- namenode
Labels:
04-27-2018
12:23 PM
how do I fix this error? Exception in doCheckpoint java.io.IOException: java.lang.IllegalStateException: Cannot skip to less than the current value (=67617893), where newValue=67617892
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.resetLastInodeId(FSNamesystem.java:657)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:280)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:140)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:848)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
at org.apache.hadoop.hdfs.server.namenode.Checkpointer.rollForwardByApplyingLogs(Checkpointer.java:311)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:1093)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:553)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:360)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:325)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:444)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:321)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Cannot skip to less than the current value (=67617893), where newValue=67617892
at org.apache.hadoop.util.SequentialNumber.skipTo(SequentialNumber.java:58)
at
... View more
04-25-2018
09:11 PM
It has happened second time today after yesterday. I restart Hive service and it started working but what's the issue here? My hive query fails with the error: java.io.InterruptedIOException: Interrupted while waiting for data to be acknowledged by pipeline
INFO : In order to change the average load for a reducer (in bytes): INFO : set mapreduce.job.reduces=<number>
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
ERROR : Execution failed with exit status: 1
ERROR : Obtaining error information
ERROR :
Task failed!
Task ID:
Stage-20
Logs:
ERROR : <a href="http://ec2-35-154-150-43.ap-south-1.compute.amazonaws.com:8888/filebrowser/view=/var/log/hive/hadoop-cmf-CD-HIVE-XCVXskZf-HIVESERVER2-ip-172-31-4-192.ap-south-1.compute.internal.log.out">/var/log/hive/hadoop-cmf-CD-HIVE-XCVXskZf-HIVESERVER2-ip-172-31-4-192.ap-south-1.compute.internal.log.out</a>
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
WARN : Shutting down task : Stage-1:MAPRED
WARN : Shutting down task : Stage-7:MAPRED
WARN : Shutting down task : Stage-11:MAPRED
INFO : Completed executing command(queryId=hive_20180425102525_88c62c1c-a506-4756-9ee4-87f218852e45); Time taken: 0.188 seconds
INFO : Cleaning up the staging area <a href="http://ec2-35-154-150-43.ap-south-1.compute.amazonaws.com:8888/filebrowser/view=/user/hue/.staging/%3Ca%20href=">job_1524082403477_35313</a>" target="_blank">/user/hue/.staging/<a href="http://ec2-35-154-150-43.ap-south-1.compute.amazonaws.com:8888/jobbrowser/jobs/job_1524082403477_35313">job_1524082403477_35313</a>
INFO : Cleaning up the staging area <a href="http://ec2-35-154-150-43.ap-south-1.compute.amazonaws.com:8888/filebrowser/view=/user/hue/.staging/%3Ca%20href=">job_1524082403477_35312</a>" target="_blank">/user/hue/.staging/<a href="http://ec2-35-154-150-43.ap-south-1.compute.amazonaws.com:8888/jobbrowser/jobs/job_1524082403477_35312">job_1524082403477_35312</a>
INFO : Cleaning up the staging area <a href="http://ec2-35-154-150-43.ap-south-1.compute.amazonaws.com:8888/filebrowser/view=/user/hue/.staging/%3Ca%20href=">job_1524082403477_35314</a>" target="_blank">/user/hue/.staging/<a href="http://ec2-35-154-150-43.ap-south-1.compute.amazonaws.com:8888/jobbrowser/jobs/job_1524082403477_35314">job_1524082403477_35314</a>
ERROR : Job Submission failed with exception 'java.io.InterruptedIOException(Interrupted while waiting for data to be acknowledged by pipeline)'
java.io.InterruptedIOException: Interrupted while waiting for data to be acknowledged by pipeline
at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:2520)
at org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:2498)
at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2662)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2621)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:369)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:341)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292)
at org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:203)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:128)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:578)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:573)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:573)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:564)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:418)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:142)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:80)
ERROR : Job Submission failed with exception 'java.io.InterruptedIOException(Interrupted while waiting for data to be acknowledged by pipeline)'
java.io.InterruptedIOException: Interrupted while waiting for data to be acknowledged by pipeline
at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:2520)
at org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:2498)
at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2662)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2621)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:369)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:341)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292)
at org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:203)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:128)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:578)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:573)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:573)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:564)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:418)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:142)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:80)
ERROR : Job Submission failed with exception 'java.io.InterruptedIOException(Interrupted while waiting for data to be acknowledged by pipeline)'
java.io.InterruptedIOException: Interrupted while waiting for data to be acknowledged by pipeline
at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:2520)
at org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:2498)
at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2662)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2621)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:369)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:341)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292)
at org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:203)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:128)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:578)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:573)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:573)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:564)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:418)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:142)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:80)
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask (state=08S01,code=1)
Closing: 0: jdbc:hive2://ip-172-31-4-192.ap-south-1.compute.internal:10000/default
Intercepting System.exit(2)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.Hive2Main], exit code [2]
... View more
- Tags:
- Data Processing
- Hive
Labels:
04-25-2018
05:16 AM
It has happened second time today after yesterday. I restart Hive service and it started working but what's the issue here? My hive query fails with the error: java.io.InterruptedIOException: Interrupted while waiting for data to be acknowledged by pipeline
INFO : In order to change the average load for a reducer (in bytes): INFO : set mapreduce.job.reduces=<number>
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
ERROR : Execution failed with exit status: 1
ERROR : Obtaining error information
ERROR :
Task failed!
Task ID:
Stage-20
Logs:
ERROR : /var/log/hive/hadoop-cmf-CD-HIVE-XCVXskZf-HIVESERVER2-ip-172-31-4-192.ap-south-1.compute.internal.log.out
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
WARN : Shutting down task : Stage-1:MAPRED
WARN : Shutting down task : Stage-7:MAPRED
WARN : Shutting down task : Stage-11:MAPRED
INFO : Completed executing command(queryId=hive_20180425102525_88c62c1c-a506-4756-9ee4-87f218852e45); Time taken: 0.188 seconds
INFO : Cleaning up the staging area job_1524082403477_35313" target="_blank">/user/hue/.staging/job_1524082403477_35313
INFO : Cleaning up the staging area job_1524082403477_35312" target="_blank">/user/hue/.staging/job_1524082403477_35312
INFO : Cleaning up the staging area job_1524082403477_35314" target="_blank">/user/hue/.staging/job_1524082403477_35314
ERROR : Job Submission failed with exception 'java.io.InterruptedIOException(Interrupted while waiting for data to be acknowledged by pipeline)'
java.io.InterruptedIOException: Interrupted while waiting for data to be acknowledged by pipeline
at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:2520)
at org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:2498)
at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2662)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2621)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:369)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:341)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292)
at org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:203)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:128)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:578)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:573)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:573)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:564)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:418)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:142)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:80)
ERROR : Job Submission failed with exception 'java.io.InterruptedIOException(Interrupted while waiting for data to be acknowledged by pipeline)'
java.io.InterruptedIOException: Interrupted while waiting for data to be acknowledged by pipeline
at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:2520)
at org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:2498)
at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2662)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2621)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:369)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:341)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292)
at org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:203)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:128)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:578)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:573)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:573)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:564)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:418)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:142)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:80)
ERROR : Job Submission failed with exception 'java.io.InterruptedIOException(Interrupted while waiting for data to be acknowledged by pipeline)'
java.io.InterruptedIOException: Interrupted while waiting for data to be acknowledged by pipeline
at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:2520)
at org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:2498)
at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2662)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2621)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:369)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:341)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292)
at org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:203)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:128)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:578)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:573)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:573)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:564)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:418)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:142)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:80)
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask (state=08S01,code=1)
Closing: 0: jdbc:hive2://ip-172-31-4-192.ap-south-1.compute.internal:10000/default
Intercepting System.exit(2)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.Hive2Main], exit code [2]
... View more
- Tags:
- Data Processing
- Hive
Labels:
04-21-2018
12:34 PM
The table however still renders like:
"SW-1951","21043","nikhil","Medium","Ready For QA","Feed - Update promotions for Google Merchant Center",3600,NA,"2018-04-21T15:34:12.038+0530"
This info is coming in a single column
If I set field.delim to ',', It spreads with values in columns but then values are coming in " (quotes) and some of the integer values do not come and appear as NULL.
What's the right way to do this
Storage information from DESC formatted for my table:
# Storage Information
NULL
NULL
34
SerDe Library:
org.apache.hadoop.hive.serde2.OpenCSVSerde
NULL
35
InputFormat:
org.apache.hadoop.mapred.TextInputFormat
NULL
36
OutputFormat:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
NULL
37
Compressed:
No
NULL
38
Num Buckets:
-1
NULL
39
Bucket Columns:
[]
NULL
40
Sort Columns:
[]
NULL
41
Storage Desc Params:
NULL
NULL
42
escapeChar "\""
43
separatorChar ,
44
serialization.format 1
... View more
Labels:
04-19-2018
12:50 PM
hadoop distcp -Dfs.s3a.access.key=ACCESSKEY -Dfs.s3a.secret.key=secretKEY -update /user/username/* s3a://bucketname/username I am not able to use -update option while passing access key and secret key in arguments through command line. Please suggest I am getting Duplicate files in input path: org.apache.hadoop.tools.CopyListing$DuplicateFileException: File hdfs://IP.ap-south-1.compute.internal:port/PATH/000000_0 and hdfs://IP.ap-south-1.compute.internal:8020/path/000000_0 would cause duplicates. Abortingat org.apache.hadoop.tools.CopyListing.validateFinalListing(CopyListing.java:164)at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:93)at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:377)at org.apache.hadoop.tools.DistCp.prepareFileListing(DistCp.java:90)at org.apache.hadoop.tools.DistCp.execute(DistCp.java:179)at org.apache.hadoop.tools.DistCp.run(DistCp.java:141)at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)at org.apache.hadoop.tools.DistCp.main(DistCp.java:441)
... View more
Labels:
04-19-2018
09:39 AM
I am trying to distcp hdfs data to s3 and get this error: how do I fix it and why does it say Failed to close file? Exception in thread "pool-9-thread-1" java.lang.OutOfMemoryError: GC overhead limit exceededat java.util.Arrays.copyOf(Arrays.java:2367)at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)at java.lang.StringBuffer.append(StringBuffer.java:237)at java.net.URI.appendSchemeSpecificPart(URI.java:1892)at java.net.URI.toString(URI.java:1922)at java.net.URI.<init>(URI.java:749)at org.apache.hadoop.fs.Path.<init>(Path.java:109)at org.apache.hadoop.fs.Path.<init>(Path.java:94)at org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:230)at org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:263)at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:772)at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:110)at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:796)at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:792)at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:792)at org.apache.hadoop.tools.SimpleCopyListing$FileStatusProcessor.getFileStatus(SimpleCopyListing.java:444)at org.apache.hadoop.tools.SimpleCopyListing$FileStatusProcessor.processItem(SimpleCopyListing.java:485)at org.apache.hadoop.tools.util.ProducerConsumer$Worker.run(ProducerConsumer.java:189)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:745)^C18/04/19 14:57:27 ERROR hdfs.DFSClient: Failed to close inode 66296599org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /user/centos/.staging/_distcp-940500037/fileList.seq (inode 66296599): File does not exist. Holder DFSClient_NONMAPREDUCE_186544109_1 does not have any open files.at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3663)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3750)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3720)at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:745)at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.complete(AuthorizationProviderProxyClientProtocol.java:245)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:540)at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)at org.apache.hadoop.ipc.Client.call(Client.java:1472)at org.apache.hadoop.ipc.Client.call(Client.java:1409)at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)at com.sun.proxy.$Proxy23.complete(Unknown Source)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:457)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:606)at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)at com.sun.proxy.$Proxy24.complete(Unknown Source)at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2690)at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2667)at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2621)at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:987)at org.apache.hadoop.hdfs.DFSClient.closeOutputStreams(DFSClient.java:1019)at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1022)at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2897)at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2914)at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
... View more
04-18-2018
10:45 PM
For some reason, either sqoop job container gets killed with error 137 or attempts to execute the sqoop job again at times. It does not seem to happen if I stop node manager role on the master server. Can someone please explain what happens if I keep this role stopped? My master server is aws m4*4 so I believe my cluster will have 16 cores less available to it now. I have tried looking through node manager logs and did not find any errors or problems. My sqoop job usually restarted with this message: This was a problem since my warehouse dir used in the command was not deleted before re-attempting and it stopped with the message output dir already exists Found[1]Map-Reduce jobs fromthis launcher Killing existing jobs and starting over:
... View more
Labels:
04-18-2018
11:27 AM
Follows the name node logs: I don't see any alerts or configuration warnings anywhere else apart from Namenode pause duration frequently. Also, could it be causing my sqoop jobs in oozie to restart (that's one of the issues I am facing right now since my sqoop job ends up failing because warehouse dir exists in case it restarts) Apr 18, 4:38:37.635 PM INFO BlockStateChange BLOCK* addStoredBlock: blockMap updated: 172.31.4.192:50010 is added to blk_1115277458_41539479{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-25895410-dd27-4d55-89df-4f58d7ade412:NORMAL:172.31.12.179:50010|RBW], ReplicaUnderConstruction[[DISK]DS-78eb8307-6579-4bfa-ae71-7e307e9fbd3a:NORMAL:172.31.4.192:50010|RBW], ReplicaUnderConstruction[[DISK]DS-f6f666b1-581a-47ce-a8a8-83084fd8f6b4:NORMAL:172.31.13.118:50010|RBW]]} size 0
Apr 18, 4:38:37.638 PM INFO org.apache.hadoop.hdfs.StateChange DIR* completeFile: /user/hue/.staging/job_1523964434795_4828/job_1523964434795_4828.summary is closed by DFSClient_NONMAPREDUCE_-1275964131_1
Apr 18, 4:38:40.557 PM INFO org.apache.hadoop.util.JvmPauseMonitor Detected pause in JVM or host machine (eg GC): pause of approximately 2356ms
GC pool 'ParNew' had collection(s): count=1 time=0ms
GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=2645ms
... View more
- Tags:
- Hadoop Core
- namenode
Labels:
04-18-2018
10:13 AM
Sqoop command arguments :
job
--meta-connect
jdbc:hsqldb:hsql://IP:16000/sqoop
--exec
price_range
--
--warehouse-dir
folder/transit/2018-04-16--11-48
Fetching child yarn jobs
tag id : oozie-e678030f4db3e129377fc1efdcc34e9a
2018-04-16 11:49:36,693 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at ip-172-31-4-192.ap-south-1.compute.internal/172.31.4.192:8032
Child yarn jobs are found - application_1519975798846_265571
Found [1] Map-Reduce jobs from this launcher
Killing existing jobs and starting over:
2018-04-16 11:49:37,314 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at ip-172-31-4-192.ap-south-1.compute.internal/172.31.4.192:8032
Killing job [application_1519975798846_265571] ... 2018-04-16 11:49:37,334 [main] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Killed application application_1519975798846_265571 Done This is what my typical sqoop job looks like: sqoop job -Dmapred.reduce.tasks=3--meta-connect jdbc:hsqldb:hsql://IP:16000/sqoop --create job_name -- import --driver com.mysql.jdbc.Driver --connect 'jdbc:mysql://ip2/erp?zeroDateTimeBehavior=convertToNull&serverTimezone=IST' --username username --password 'PASS' --table orders --merge-key order_num --split-by order_num --hive-import --hive-overwrite --hive-database Erp --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N' --fields-terminated-by '\001' --input-null-string '\\N' --input-null-non-string '\\N' --input-null-non-string '\\N' --input-fields-terminated-by '\001' --m 12 This is how I execute my jobs in oozie job --meta-connect jdbc:hsqldb:hsql://ip:16000/sqoop --exec JOB_NAME-- --warehouse-dir folder/transit/${DATE} Now, I recently started getting an error: output directory already exists no matter what timestamp I pass in $DATE variable. It randomly gives this out on any sqoop job in oozie. I add --warehouse-dir folder/Snapshots/${DATE} while executing job so that I DON'T GET output directory already exists ever but I started getting this yesterday out of nowhere.Currently, I do not see any flags about services acting up however namenode pause duration is concerning at regular intervals. How do I fix this? This err message makes it pretty intuitive that it is happening since warehouse dir gets created before it attempts to restart the job however the whole purpose of using warehouse-dir was to create a transitional directory so that I won't get this error. How do I fix this? Found [1] Map-Reduce jobs from this launcher
Killing existing jobs and starting over:
... View more
- Tags:
- Sqoop
Labels:
04-18-2018
09:33 AM
I see this message in stdout tab in oozie:
Found [1] Map-Reduce jobs from this launcher
Killing existing jobs and starting over:
2018-04-18 14:45:13,569 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at ip-172-31-4-192.ap-south-1.compute.internal/172.31.4.192:8032
Killing job [application_1523964434795_4483] ... 2018-04-18 14:45:13,776 [main] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Killed application application_1523964434795_4483 In stderr:
Log Upload Time: Wed Apr 18 14:45:23 +0530 2018
Log Length: 288
Note: /tmp/sqoop-yarn/compile/d443da40c930f2217e70e38a82730dc0/fabric_po.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Intercepting System.exit(1) Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
... View more
04-18-2018
12:32 AM
In cluster metrics, memory used is always equal to number of vCores in use. Forex: if 18 vcores are in use, memory used is 18 GB.This looks weird to me. Could someone please explain what's happening here , if I should allow more memory to be used and how to do that I understand by default 1 gb is allocated to each of the containers and number of containers running at a time is equal to vcores used. However, I see Memory available in my cluster is 150GB and being used it upto 50 GB since that is the number of vcores. I am getting these numbers from cluster metrics. metrics for referrence:
Cluster Metrics Apps Submitted Apps Pending Apps Running Apps Completed Containers Running Memory Used Memory Total Memory Reserved VCores Used VCores Total VCores Reserved 2140 0 9 2131 24 24 GB 150 GB 0 B 24 56 0
... View more
04-17-2018
10:36 PM
This is what my typical sqoop job looks like: sqoop job -Dmapred.reduce.tasks=3 --meta-connect jdbc:hsqldb:hsql://IP:16000/sqoop --create job_name -- import --driver com.mysql.jdbc.Driver --connect 'jdbc:mysql://ip2/erp?zeroDateTimeBehavior=convertToNull&serverTimezone=IST' --username username --password 'PASS' --table orders --merge-key order_num --split-by order_num --hive-import --hive-overwrite --hive-database Erp --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N' --fields-terminated-by '\001' --input-null-string '\\N' --input-null-non-string '\\N' --input-null-non-string '\\N' --input-fields-terminated-by '\001' --m 12
This is how I execute my jobs in oozie job --meta-connect jdbc:hsqldb:hsql://ip:16000/sqoop --exec JOB_NAME-- --warehouse-dir folder/Snapshots/${DATE} Now, I recently started getting error: output directory already exists no matter what timestamp I pass in $DATE variable. This is probably because of a server process restarting. Yesterday I could see node manager restart over and over but that's not the case today either. It randomly gives this out on any sqoop job in oozie. I add --warehouse-dir folder/Snapshots/${DATE} while executing job so that I DONT GET output directory already exists ever but I started getting this yesterday out of nowhere. Currently I do not see any flags about services acting up however namenode pause duration is concerning at regular intervals. How do I fix this?
... View more
Labels:
04-17-2018
09:40 AM
@Tarun Parimi Here's the warning I get: Apr 17, 3:03:05.946 PMINFOorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutorDeleting path : /yarn/container-logs/application_1523905807460_4113/container_1523905807460_4113_01_000001/stderr
Apr 17, 3:03:05.963 PMINFOorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutorDeleting path : /yarn/container-logs/application_1523905807460_4113
Apr 17, 3:03:07.091 PMWARNorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutorExit code from container container_1523905807460_4129_01_000001 is : 137
Apr 17, 3:03:07.091 PMINFOorg.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerContainer container_1523905807460_4129_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
Apr 17, 3:03:07.091 PMINFOorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunchCleaning up container container_1523905807460_4129_01_000001
Apr 17, 3:03:07.111 PMINFOorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutorDeleting absolute path : /yarn/nm/usercache/hue/appcache/application_1523905807460_4129/container_1523905807460_4129_01_000001
Apr 17, 3:03:07.112 PMWARNorg.apache.hadoop.yarn.server.nodemanager.NMAuditLoggerUSER=hue OPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE APPID=application_1523905807460_4129 CONTAINERID=container_1523905807460_4129_01_000001
Apr 17, 3:03:07.112 PMINFOorg.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerContainer container_1523905807460_4129_01_000001 transitioned from EXITED_WITH_FAILURE to DONE
Apr 17, 3:03:07.112 PMINFOorg.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationRemoving container_1523905807460_4129_01_000001 from application application_1523905807460_4129
Apr 17, 3:03:07.112 PMINFOorg.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImplConsidering container container_1523905807460_4129_01_000001 for log-aggregation
Apr 17, 3:03:07.112 PMINFOorg.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServicesGot event CONTAINER_STOP for appId application_1523905807460_4129
... View more
04-17-2018
02:40 AM
I have a 4 Node cluster. On my Master Node, Node Manager is continuously exiting. It gets restarted on it own but It's disturbing some of my processes. For this reason, I have stopped Node Manager on Master Node. Of course that solves my problem with processes getting interrupted because of Node Manager restart but Under Cluster Node metrics, It shows me 1 Lost node (which makes sense, There is not Node Manager running on this node). I have also tried increasing Heap Memory for NameNode and Secondary Name Node but that did not help. Please suggest what can be done to fix this? In Hadoop yarn folder logs, Apr 17, 3:03:05.946 PMINFOorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutorDeleting path : /yarn/container-logs/application_1523905807460_4113/container_1523905807460_4113_01_000001/stderr
Apr 17, 3:03:05.963 PMINFOorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutorDeleting path : /yarn/container-logs/application_1523905807460_4113
Apr 17, 3:03:07.091 PMWARNorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutorExit code from container container_1523905807460_4129_01_000001 is : 137
Apr 17, 3:03:07.091 PMINFOorg.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerContainer container_1523905807460_4129_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
Apr 17, 3:03:07.091 PMINFOorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunchCleaning up container container_1523905807460_4129_01_000001
Apr 17, 3:03:07.111 PMINFOorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutorDeleting absolute path : /yarn/nm/usercache/hue/appcache/application_1523905807460_4129/container_1523905807460_4129_01_000001
Apr 17, 3:03:07.112 PMWARNorg.apache.hadoop.yarn.server.nodemanager.NMAuditLoggerUSER=hue OPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE APPID=application_1523905807460_4129 CONTAINERID=container_1523905807460_4129_01_000001
Apr 17, 3:03:07.112 PMINFOorg.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerContainer container_1523905807460_4129_01_000001 transitioned from EXITED_WITH_FAILURE to DONE
Apr 17, 3:03:07.112 PMINFOorg.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationRemoving container_1523905807460_4129_01_000001 from application application_1523905807460_4129
Apr 17, 3:03:07.112 PMINFOorg.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImplConsidering container container_1523905807460_4129_01_000001 for log-aggregation
Apr 17, 3:03:07.112 PMINFOorg.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServicesGot event CONTAINER_STOP for appId application_1523905807460_4129
<br>
... View more
04-16-2018
07:44 PM
Apr 17 01:05:05 ip-172-31-4-192 systemd: Stopping user-986.slice.
Apr 17 01:06:01 ip-172-31-4-192 systemd: Created slice user-986.slice.
Apr 17 01:06:01 ip-172-31-4-192 systemd: Starting user-986.slice.
Apr 17 01:06:01 ip-172-31-4-192 systemd: Started Session 133520 of user yarn.
Apr 17 01:06:01 ip-172-31-4-192 systemd: Starting Session 133520 of user yarn.
Apr 17 01:06:04 ip-172-31-4-192 systemd: Removed slice user-986.slice.
Apr 17 01:06:04 ip-172-31-4-192 systemd: Stopping user-986.slice.
Apr 17 01:06:18 ip-172-31-4-192 kernel: net_ratelimit: 14 callbacks suppressed
Apr 17 01:07:01 ip-172-31-4-192 systemd: Created slice user-986.slice.
Apr 17 01:07:01 ip-172-31-4-192 systemd: Starting user-986.slice.
Apr 17 01:07:01 ip-172-31-4-192 systemd: Started Session 133521 of user yarn.
Apr 17 01:07:01 ip-172-31-4-192 systemd: Starting Session 133521 of user yarn.
Apr 17 01:07:05 ip-172-31-4-192 systemd: Removed slice user-986.slice.
Apr 17 01:07:05 ip-172-31-4-192 systemd: Stopping user-986.slice.
Apr 17 01:08:01 ip-172-31-4-192 systemd: Created slice user-986.slice.
Apr 17 01:08:01 ip-172-31-4-192 systemd: Starting user-986.slice.
Apr 17 01:08:01 ip-172-31-4-192 systemd: Started Session 133522 of user yarn.
Apr 17 01:08:01 ip-172-31-4-192 systemd: Starting Session 133522 of user yarn.
Apr 17 01:08:04 ip-172-31-4-192 systemd: Removed slice user-986.slice.
Apr 17 01:08:04 ip-172-31-4-192 systemd: Stopping user-986.slice.
Apr 17 01:09:01 ip-172-31-4-192 systemd: Created slice user-986.slice.
Apr 17 01:09:01 ip-172-31-4-192 systemd: Starting user-986.slice.
Apr 17 01:09:01 ip-172-31-4-192 systemd: Started Session 133523 of user yarn.
Apr 17 01:09:01 ip-172-31-4-192 systemd: Starting Session 133523 of user yarn.
Apr 17 01:09:05 ip-172-31-4-192 systemd: Removed slice user-986.slice.
Apr 17 01:09:05 ip-172-31-4-192 systemd: Stopping user-986.slice.
Apr 17 01:10:01 ip-172-31-4-192 systemd: Started Session 133524 of user rstudio.
Apr 17 01:10:01 ip-172-31-4-192 systemd: Starting Session 133524 of user rstudio.
Apr 17 01:10:01 ip-172-31-4-192 systemd: Started Session 133525 of user root.
Apr 17 01:10:01 ip-172-31-4-192 systemd: Starting Session 133525 of user root.
Apr 17 01:10:01 ip-172-31-4-192 systemd: Created slice user-986.slice.
Apr 17 01:10:01 ip-172-31-4-192 systemd: Starting user-986.slice.
Apr 17 01:10:01 ip-172-31-4-192 systemd: Started Session 133526 of user yarn.
Apr 17 01:10:01 ip-172-31-4-192 systemd: Starting Session 133526 of user yarn.
Apr 17 01:10:05 ip-172-31-4-192 systemd: Removed slice user-986.slice.
Apr 17 01:10:05 ip-172-31-4-192 systemd: Stopping user-986.slice.
Apr 17 01:11:01 ip-172-31-4-192 systemd: Created slice user-986.slice.
Apr 17 01:11:01 ip-172-31-4-192 systemd: Starting user-986.slice.
Apr 17 01:11:01 ip-172-31-4-192 systemd: Started Session 133527 of user yarn.
Apr 17 01:11:01 ip-172-31-4-192 systemd: Starting Session 133527 of user yarn.
Apr 17 01:11:04 ip-172-31-4-192 systemd: Removed slice user-986.slice.
Apr 17 01:11:04 ip-172-31-4-192 systemd: Stopping user-986.slice.
[centos@ip-172-31-4-192 ~]$
From hadoop-yarn folder, 2018-04-17 01:02:23,162 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: FINISH_APPLICATION sent to absent application application_1519975798846_110997
2018-04-17 01:02:23,162 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: FINISH_APPLICATION sent to absent application application_1519975798846_111000
2018-04-17 01:02:23,162 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: FINISH_APPLICATION sent to absent application application_1519975798846_111010
2018-04-17 01:02:23,162 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: FINISH_APPLICATION sent to absent application application_1519975798846_111011
2018-04-17 01:02:23,162 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: FINISH_APPLICATION sent to absent application application_1519975798846_111013
2018-04-17 01:02:23,162 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: FINISH_APPLICATION sent to absent application application_1519975798846_111025
... View more
04-16-2018
07:18 PM
Edit: I reverted heap memory for namenode and secondary namenode to 784 MiB and yet it is exiting unexpectedly. PLEASE SUGGEST.
... View more
04-16-2018
06:41 PM
@Tarun ParimiThanks. I get: 37706 yarn 20 0 323600 17248 32 S 802.7 0.0 37577:11 suppoie for yarn. 17248 is value in RES column. I actually wanted to check what value I can assign to java heap memory namenode and secondary namenode so that it won't exit. server has 64GB memory (aws m4*4) centos 7 machine. On the machine, I have , HDFS, Hive, Hue, Oozie,sentry,sqoop and yarn
... View more
04-16-2018
06:18 PM
How do I check how much memory is available for namenode and secondary namenode?
... View more
04-16-2018
06:03 PM
I just increased Java heap memory for name node and secondary name node to 5Gib. It was set to something like 718 mb before. I started getting container killed with error code 137 so increased heap memory. After increasing heap memory and restarting services, I am getting unexpected exits for node manager. How do I debug and fix this? How do I check how much memory is available for namenode and secondary namenode? Edit: I reverted heap memory for namenode and secondary namenode to 784 MiB and yet it is exiting unexpectedly. PLEASE SUGGEST. Because of Nodemanager exiting, my sqoop job starts reporting error: output-dir already exists. I believe it is because it tried to create it again when node manager is back
... View more
Labels:
04-16-2018
04:54 PM
sqoop jobs running in oozie get killed randomly with error There is no one really killing the job so why do I see : Killed by external signal<br> I get this error in task diagnostic log Container exited with a non-zero exit code 137
Killed by external signal
Under diagnostics, I keep getting: Application killed by user. Heap Memory allocated to Sqoop is 2GB. we started getting this error out of nowhere and now it is seldom that my oozie workflow with all sqoop jobs in it completes without this error. On every 2nd or 4th job, I get this error. I don't see any other errors anywhere. Can I please get some pointers on this? How do I fix this? I also see some of them failing with: 2018-04-16 22:25:21,494 [main] ERROR org.apache.sqoop.tool.ImportTool - Import failed: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory <a href="link">hdfs://ip:8020/user/hue/folder/hots/2017-04-16--08--30--00/table_hsn_mapping</a> already exists
All my jobs in oozie are ecxecuting with: job --meta-connect jdbc:hsqldb:hsql://ip:16000/sqoop --exec table_hsn_mapping -- --warehouse-dir folder/Snapshots/${DATE} So, I never used to get FileAlreadyExists error because of using warehouse-dir folder. Somehow I started getting it 2 days back and none of it makes sense to me right now. Any help would be appreciated.
... View more
Labels:
04-13-2018
11:59 AM
1 Kudo
I have a file of size 70GB that I copy from local FS to HDFS. Once copied, I created an external table over it. This size will continue to grow and I feel querying this table is exhausting cluster resources completely. SELECT statement for one day data is creating 273 mappers and it takes forever to query it. What's the best way to handle this? I have this data in CSV format. Am I doing something wrong here? I have tried using copyFromLocal and put. Table is not partitioned because I have datetime column and no date column. Please suggest
... View more
- Tags:
- Data Processing
- Hive
Labels: