Created on 10-29-2014 08:50 AM - edited 09-16-2022 02:11 AM
everybody, and Cloudera support:
i have catched an issue so strange to me. i am not sure there are some parameter to resolve this or not. please be patient, since this will be a long story.
system user like HDFS, YARN, MAPRED, HUE can't invoke container. but other users which i created manually can.
My env is: CDH 5.2(the latest version) + Kerberos + Sentry + OPENLDAP
yesteady, i was going to create workflow to import data to hive from MySQL by oozie, the Sqoop job is :
sqoop import -connect jdbc:mysql://10.32.87.4:3306/xxxx -username admin -password xxxxxxxx -table t_phone -hive-table t_phone -hive-database xxxx -hive-import -hive-overwrite -hive-drop-import-delims -m 1
but this job is failed. the errors like below:
INFO mapreduce.Job: Job job_1414579088733_0016 failed with state FAILED due to: Application application_1414579088733_0016 failed 2 times due to AM Container for appattempt_1414579088733_0016_000002 exited with exitCode: -1000 due to: Application application_1414579088733_0016 initialization failed (exitCode=139) with output:
.Failing this attempt.. Failing the application.
i have no any doubt about this sqoop job, since this is ok in our PRD env (our PRD env is CDH5.1). then i was going to OS level using hdfs to issue this sqoop script, it's failed too, the error is the same as above. then i am going to seach google, just find a few issues like me, but these error code is 1 or others , the solution is set HADOOP_YARN_HOME or HADOOP_MAPRED_HOME. so i was going to try set HADOOP_YARN_HOME or HADOOP_MAPRED_HOME, and try again, mission failed too.
at that moment, i assumed maybe this is file or directory permission issue(since i have encountered this kind issue ago)
then i was going to delete /tmp, /user/history, /var/log etc.. restart all cluster. try again, mission failed too too.
ok, i have no any idea, then go back to home to cook dinner, and watch moive, enjoy music, have a good sleep.
today morning, i don't try sqoop anymore, since i have no any confidence, i am going to test example mapreduce,
the command is : hadoop jar hadoop-examples.jar pi 10 10
mission failed. the errors is the same. as you can see, the error include some message like :exited with exitCode: -1000.
when i see 1000, i remember there is a setting in YARN means if user id is below 1000, this user can't invoke container in default. we should set 1000 to 0 or add user id below 1000 to allow user list. then i am going to check these setting, everything is ok,
why ? why ? why ? i ask myself for many times, but no answer. but i believe this 1000 has connection to that 1000.
test begin:
i create a user with my name, the user id is 1500, execute sqoop scripts, it's successful. i believe my assumption more stronger, since my owner user has executed import data by sqoop successful.
then i am going to create another is test, the user id is 999, import data done.
and try example mapreduce, SUCCESSFUL...
so i am going back to use hdfs try sqoop and mapreduce, it's failed. then try yarn, hue, mission failed.
then i think all the system user can't invoke container, but other users can do it, no mantter what the user id it is .
later, i open http://10.32.87.9:8088/cluster/nodes and http://10.32.87.49:8042/node/allContainers to monitor container activities. if the user is my owner user, the container can be invoked and run normally, but if the user is hds or other system user, the container can't be invoke(since no container in running state)
just show you an example, please look carefully on the highlight words, this is LINUX user.
[hdfs@datanode01 hadoop-0.20-mapreduce]$ hadoop jar hadoop-examples.jar pi 10 10
Number of Maps = 10
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
14/10/29 22:23:25 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 65 for hdfs on ha-hdfs:cluster
14/10/29 22:23:25 INFO security.TokenCache: Got dt for hdfs://cluster; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cluster, Ident: (HDFS_DELEGATION_TOKEN token 65 for hdfs)
14/10/29 22:23:25 INFO input.FileInputFormat: Total input paths to process : 10
14/10/29 22:23:25 INFO mapreduce.JobSubmitter: number of splits:10
14/10/29 22:23:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1414579088733_0010
14/10/29 22:23:25 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cluster, Ident: (HDFS_DELEGATION_TOKEN token 65 for hdfs)
14/10/29 22:23:27 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1414579088733_0010 is still in NEW
14/10/29 22:23:29 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1414579088733_0010 is still in NEW
14/10/29 22:23:31 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1414579088733_0010 is still in NEW
14/10/29 22:23:33 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1414579088733_0010 is still in NEW
14/10/29 22:23:35 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1414579088733_0010 is still in NEW
14/10/29 22:23:36 INFO impl.YarnClientImpl: Submitted application application_1414579088733_0010
14/10/29 22:23:36 INFO mapreduce.Job: The url to track the job: http://namenode01.hadoop:8088/proxy/application_1414579088733_0010/
14/10/29 22:23:36 INFO mapreduce.Job: Running job: job_1414579088733_0010
14/10/29 22:24:00 INFO mapreduce.Job: Job job_1414579088733_0010 running in uber mode : false
14/10/29 22:24:00 INFO mapreduce.Job: map 0% reduce 0%
14/10/29 22:24:00 INFO mapreduce.Job: Job job_1414579088733_0010 failed with state FAILED due to: Application application_1414579088733_0010 failed 2 times due to AM Container for appattempt_1414579088733_0010_000002 exited with exitCode: -1000 due to: Application application_1414579088733_0010 initialization failed (exitCode=139) with output:
.Failing this attempt.. Failing the application.
14/10/29 22:24:00 INFO mapreduce.Job: Counters: 0
Job Finished in 35.389 seconds
java.io.FileNotFoundException: File does not exist: hdfs://cluster/user/hdfs/QuasiMonteCarlo_1414592602966_1277483233/out/reduce-out
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1083)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1075)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1075)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1749)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
[test@datanode01 hadoop-0.20-mapreduce]$ hadoop jar hadoop-examples.jar pi 10 10
Number of Maps = 10
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
14/10/29 22:29:45 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 66 for test on ha-hdfs:cluster
14/10/29 22:29:45 INFO security.TokenCache: Got dt for hdfs://cluster; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cluster, Ident: (HDFS_DELEGATION_TOKEN token 66 for test)
14/10/29 22:29:45 INFO input.FileInputFormat: Total input paths to process : 10
14/10/29 22:29:45 INFO mapreduce.JobSubmitter: number of splits:10
14/10/29 22:29:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1414579088733_0011
14/10/29 22:29:45 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cluster, Ident: (HDFS_DELEGATION_TOKEN token 66 for test)
14/10/29 22:29:47 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1414579088733_0011 is still in NEW
14/10/29 22:29:49 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1414579088733_0011 is still in NEW
14/10/29 22:29:51 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1414579088733_0011 is still in NEW
14/10/29 22:29:53 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1414579088733_0011 is still in NEW
14/10/29 22:29:55 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1414579088733_0011 is still in NEW
14/10/29 22:29:56 INFO impl.YarnClientImpl: Submitted application application_1414579088733_0011
14/10/29 22:29:56 INFO mapreduce.Job: The url to track the job: http://namenode01.hadoop:8088/proxy/application_1414579088733_0011/
14/10/29 22:29:56 INFO mapreduce.Job: Running job: job_1414579088733_0011
14/10/29 22:30:40 INFO mapreduce.Job: Job job_1414579088733_0011 running in uber mode : false
14/10/29 22:30:40 INFO mapreduce.Job: map 0% reduce 0%
14/10/29 22:30:50 INFO mapreduce.Job: map 30% reduce 0%
14/10/29 22:31:09 INFO mapreduce.Job: map 50% reduce 0%
14/10/29 22:31:18 INFO mapreduce.Job: map 70% reduce 0%
14/10/29 22:31:21 INFO mapreduce.Job: map 100% reduce 0%
14/10/29 22:31:30 INFO mapreduce.Job: map 100% reduce 100%
14/10/29 22:31:30 INFO mapreduce.Job: Job job_1414579088733_0011 completed successfully
14/10/29 22:31:30 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=92
FILE: Number of bytes written=1235676
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2570
HDFS: Number of bytes written=215
HDFS: Number of read operations=43
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=9
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=268763
Total time spent by all reduces in occupied slots (ms)=3396
Total time spent by all map tasks (ms)=268763
Total time spent by all reduce tasks (ms)=3396
Total vcore-seconds taken by all map tasks=268763
Total vcore-seconds taken by all reduce tasks=3396
Total megabyte-seconds taken by all map tasks=275213312
Total megabyte-seconds taken by all reduce tasks=3477504
Map-Reduce Framework
Map input records=10
Map output records=20
Map output bytes=180
Map output materialized bytes=339
Input split bytes=1390
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=339
Reduce input records=20
Reduce output records=0
Spilled Records=40
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=423
CPU time spent (ms)=7420
Physical memory (bytes) snapshot=4415447040
Virtual memory (bytes) snapshot=16896184320
Total committed heap usage (bytes)=4080009216
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1180
File Output Format Counters
Bytes Written=97
Job Finished in 105.199 seconds
Estimated value of Pi is 3.20000000000000000000
could you give me some suggestion, how to fix this issue? i have opened an message , the link is :
please ignore this link, if anybody has idea, please paster your solution in here, thanks very much.
Created 11-05-2014 06:29 AM
i have got resolve this issue, true be told, about hdfs, yarn or mapred, i know it's kept from submitting jobs in default, but i think you also know, min.user.id and allow user list are for this case, so the issue is not about user or job.
i have montiored many times, when just 1 container start, it's dead automaticlly after secs, but when the normal state, basicly it will invoke 3-4 containers in my env. so i can sure this issue is about cotainer can't work normal.
but why ? as i said it's just one container has been start normally, so i can check this container log, but can't find nothing, the errors like what i have shown in the above. and i also explain when the sqoop execute normally, it will create a directory in the usercache directory, but when sqoop job failed, it won't, so i guess maybe this directory has some problems, but of course, i don't know the exact reason.
then i delete namenode HA, just leave one namenode and one secondary namenode as default, then start sqoop again, it's failed too, but at this time, the log is more readable, "NOT INITALIZE CONTAINER" error show to me. this logs make me more confidential, it's really because job can't invoke container.
at last, i stop all the cluster, delete /yarn/* in datanode and namenode, then start all cluster, it works fine now.
currently, i still don;t know why hdfs or yarn can't invoke container, but the problem has been resolved.
Created 10-29-2014 09:05 AM
forgot to say, i can select in HIVE or Impala normally by HUE, since this is also mapreduce, but it's normal, that's why i said the issue is strange.
Created 10-29-2014 11:29 PM
i have digged more deep today, and find something.
please have look below information:
[root@datanode03 usercache]# pwd
/yarn/nm/usercache
[root@datanode03 usercache]# ls
hive hue jlwang test
we can see there are four directories in /yarn/nm/usercache, these users can execute example mapreduce or sqoop successful. as i have said, HDFS, YARN this kind system user can't invoke container, at the beginning, i think it's about user id is below 1000, but i check the setting many times, it's no problems. because i have set min.user.id = 0 and add hdfs, mapred, yarn to allow user list.
at this moment, i assume maybe it's because there are no hdfs and yarn directory in /yarn/nm/usercache, basiclly, every user should create their own directory in the /yarn/nm/usercache when run map reduce job, but HDFS and YARN didn't create this directory, why ???
test begin(confirm when new user execute map reduce job , it should create directory in usercache.):
1) create new user, the user name is iamfromsky, the user id is below 1000.
useradd -u 600 iamfromsky
2) addprinc iamfromsky@DDS.COM
3) login LINUX system by iamfromsky, and execute map reduce job.
[iamfromsky@datanode03 hadoop-0.20-mapreduce]$ hadoop jar hadoop-examples.jar pi 10 10
Number of Maps = 10
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
14/10/30 14:10:00 INFO client.RMProxy: Connecting to ResourceManager at namenode01.hadoop/10.32.87.9:8032
14/10/30 14:10:00 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 91 for iamfromsky on ha-hdfs:cluster
14/10/30 14:10:00 INFO security.TokenCache: Got dt for hdfs://cluster; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cluster, Ident: (HDFS_DELEGATION_TOKEN token 91 for iamfromsky)
14/10/30 14:10:00 INFO input.FileInputFormat: Total input paths to process : 10
14/10/30 14:10:00 INFO mapreduce.JobSubmitter: number of splits:10
14/10/30 14:10:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1414638687299_0013
14/10/30 14:10:00 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cluster, Ident: (HDFS_DELEGATION_TOKEN token 91 for iamfromsky)
14/10/30 14:10:03 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1414638687299_0013 is still in NEW
14/10/30 14:10:05 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1414638687299_0013 is still in NEW
14/10/30 14:10:07 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1414638687299_0013 is still in NEW
14/10/30 14:10:09 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1414638687299_0013 is still in NEW
14/10/30 14:10:11 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1414638687299_0013 is still in NEW
14/10/30 14:10:11 INFO impl.YarnClientImpl: Submitted application application_1414638687299_0013
14/10/30 14:10:11 INFO mapreduce.Job: The url to track the job: http://namenode01.hadoop:8088/proxy/application_1414638687299_0013/
14/10/30 14:10:11 INFO mapreduce.Job: Running job: job_1414638687299_0013
14/10/30 14:10:55 INFO mapreduce.Job: Job job_1414638687299_0013 running in uber mode : false
14/10/30 14:10:55 INFO mapreduce.Job: map 0% reduce 0%
14/10/30 14:11:05 INFO mapreduce.Job: map 30% reduce 0%
14/10/30 14:11:24 INFO mapreduce.Job: map 50% reduce 0%
14/10/30 14:11:34 INFO mapreduce.Job: map 70% reduce 0%
14/10/30 14:11:36 INFO mapreduce.Job: map 100% reduce 0%
14/10/30 14:11:44 INFO mapreduce.Job: map 100% reduce 100%
14/10/30 14:11:44 INFO mapreduce.Job: Job job_1414638687299_0013 completed successfully
14/10/30 14:11:44 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=89
FILE: Number of bytes written=1204297
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2630
HDFS: Number of bytes written=215
HDFS: Number of read operations=43
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=264122
Total time spent by all reduces in occupied slots (ms)=3223
Total time spent by all map tasks (ms)=264122
Total time spent by all reduce tasks (ms)=3223
Total vcore-seconds taken by all map tasks=264122
Total vcore-seconds taken by all reduce tasks=3223
Total megabyte-seconds taken by all map tasks=270460928
Total megabyte-seconds taken by all reduce tasks=3300352
Map-Reduce Framework
Map input records=10
Map output records=20
Map output bytes=180
Map output materialized bytes=339
Input split bytes=1450
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=339
Reduce input records=20
Reduce output records=0
Spilled Records=40
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=423
CPU time spent (ms)=6450
Physical memory (bytes) snapshot=4420325376
Virtual memory (bytes) snapshot=16857763840
Total committed heap usage (bytes)=4029153280
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1180
File Output Format Counters
Bytes Written=97
Job Finished in 104.169 seconds
Estimated value of Pi is 3.20000000000000000000
4) check usercache directory
[yarn@datanode03 usercache]$ cd /yarn/nm/usercache/
[yarn@datanode03 usercache]$ ls
hive hue iamfromsky jlwang test
we can see, the iamfromsky directory has been created. then i am going to create hdfs directory manually, and try again, mission failed.
so i don't think the root cause is hdfs can't create directory, i think there are some reasons to cause can't create hdfs directory.
who can give me some advises ?
Created 10-30-2014 10:27 AM
Created 10-30-2014 10:55 AM
Created 11-02-2014 02:14 AM
as i said, it can't invoke Container, so no any container log.
Created 11-04-2014 01:39 PM
Please refer to the following page as part of the Kerberos setup: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_sg_s7_prepare_cluste...
By default, the mapred, hdfs, and bin user accounts are kept from submitting and executing jobs.
Created 11-05-2014 06:29 AM
i have got resolve this issue, true be told, about hdfs, yarn or mapred, i know it's kept from submitting jobs in default, but i think you also know, min.user.id and allow user list are for this case, so the issue is not about user or job.
i have montiored many times, when just 1 container start, it's dead automaticlly after secs, but when the normal state, basicly it will invoke 3-4 containers in my env. so i can sure this issue is about cotainer can't work normal.
but why ? as i said it's just one container has been start normally, so i can check this container log, but can't find nothing, the errors like what i have shown in the above. and i also explain when the sqoop execute normally, it will create a directory in the usercache directory, but when sqoop job failed, it won't, so i guess maybe this directory has some problems, but of course, i don't know the exact reason.
then i delete namenode HA, just leave one namenode and one secondary namenode as default, then start sqoop again, it's failed too, but at this time, the log is more readable, "NOT INITALIZE CONTAINER" error show to me. this logs make me more confidential, it's really because job can't invoke container.
at last, i stop all the cluster, delete /yarn/* in datanode and namenode, then start all cluster, it works fine now.
currently, i still don;t know why hdfs or yarn can't invoke container, but the problem has been resolved.