Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Unknown Job ID for long running jobs on History Server

avatar
New Contributor

Hi,

 

I try to run TPCx-HS2 - which is basically TeraSort - to test my Hadoop/Yarn cluster. For the generation and validation part, everything works fine. The sorting itself also works, but in the end, it crashes because the MR JobHistory server doesn't know the job id. I double checked the configuration and the history server is available and also the Gens/Validations before and after the sorts do show up. The only difference is that of course generation/validation is a lot faster than sorting, but I don't know why this can lead to the job ID being unknown.

 

You can see my log below. Any help is much appreciated...

 

 

2020-03-31 13:07:20,109 INFO mapreduce.Job:  map 100% reduce 97%
2020-03-31 13:10:00,277 INFO mapreduce.Job:  map 100% reduce 98%
2020-03-31 13:12:05,179 INFO mapreduce.Job:  map 100% reduce 99%
2020-03-31 13:14:27,607 INFO mapreduce.Job:  map 100% reduce 100%
2020-03-31 13:14:40,956 INFO mapreduce.Job: Job job_1585647217951_0003 completed successfully
2020-03-31 13:14:41,256 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=
2020-03-31 13:14:41,674 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=
2020-03-31 13:14:41,790 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=
Exception in thread "main" java.io.IOException: java.io.IOException: Unknown Job job_1585647217951_0003
        at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.verifyAndGetJob(HistoryC
        at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getCounters(HistoryClien
        at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getCounters(MRClientPr
        at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProt
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1000)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2916)

 

 

2 REPLIES 2

avatar
New Contributor

Oh, in addition, just to show that the jobs with suffixes before and after _3 show up in the log directory of the history server:

 

drwxrwxrwt   - hadoop hadoop              0 2020-03-31 11:33 /user/history
drwxrwx---   - hadoop hadoop              0 2020-03-31 11:54 /user/history/done
drwxrwx---   - hadoop hadoop              0 2020-03-31 11:54 /user/history/done/2020
drwxrwx---   - hadoop hadoop              0 2020-03-31 11:54 /user/history/done/2020/03
drwxrwx---   - hadoop hadoop              0 2020-03-31 11:54 /user/history/done/2020/03/31
drwxrwx---   - hadoop hadoop              0 2020-03-31 13:44 /user/history/done/2020/03/31/000000
-rwxrwx---   1 root   hadoop          68050 2020-03-31 11:52 /user/history/done/2020/03/31/000000/job_1585647217951_0002-1585647944784-root-HSGen-1585648448385-15-0-SUCCEEDED-default-1585647977205.jhist
-rwxrwx---   1 root   hadoop         215999 2020-03-31 11:52 /user/history/done/2020/03/31/000000/job_1585647217951_0002_conf.xml
-rwxrwx---   1 root   hadoop          51476 2020-03-31 13:18 /user/history/done/2020/03/31/000000/job_1585647217951_0004-1585653295800-root-HSValidate-1585653602686-8-1-SUCCEEDED-default-1585653383672.jhist
-rwxrwx---   1 root   hadoop         216412 2020-03-31 13:18 /user/history/done/2020/03/31/000000/job_1585647217951_0004_conf.xml
-rwxrwx---   1 root   hadoop          67441 2020-03-31 13:36 /user/history/done/2020/03/31/000000/job_1585647217951_0005-1585654239304-root-HSGen-1585654641074-15-0-SUCCEEDED-default-1585654270486.jhist
-rwxrwx---   1 root   hadoop         215999 2020-03-31 13:36 /user/history/done/2020/03/31/000000/job_1585647217951_0005_conf.xml
drwxrwxrwt   - hadoop hadoop              0 2020-03-31 11:45 /user/history/done_intermediate
drwxrwx---   - hadoop hadoop              0 2020-03-31 11:34 /user/history/done_intermediate/hadoop
drwxrwx---   - root   hadoop              0 2020-03-31 13:44 /user/history/done_intermediate/root

avatar
Master Collaborator

Are there any error in JHS logs especially around this timeframe 2020-03-31 13:14:* ?