Created 02-04-2016 03:11 PM
I am struggling to fix the issue that I am facing while executing hadoop mareduce jobs in my cluster. I am running the mapreduce job on the cluster created through Ambari (not sandbox). The cluster has 4 nodes (including the master node). Following is the error that I get
This token is expired. current time is 1454617494914 found 1454598336617 Note: System times on machines may be out of sync. Check system time and time zones.
I checked the time on all the nodes. I found that, except the master node, time on all the other nodes were incorrect. So I manually corrected (ntpd was failing to connect to servers) the time on all the nodes.
Searching the internet, I found that there is a setting 'yarn.resourcemanager.rm.container-allocation.expiry-interval-ms' which can be used to increase the lifespan of the container. I could not find this setting anywhere in the advanced configuration on the Ambari dashboard. Can anyone help me understand what is going on ?
Created 02-04-2016 03:14 PM
This is the exact root cause
I checked the time on all the nodes. I found that, except the master node, time on all the other nodes were incorrect. So I manually corrected (ntpd was failing to connect to servers) the time on all the nodes.
Do you know why ntpd is failing?
Created 02-04-2016 05:05 PM
@Pradeep kumar its a common error, Google search gave me this Link
Created 02-04-2016 05:08 PM
@Pradeep kumar also make sure your firewall accepts all traffic from servers in the cluster. You can open ports granularly or allow all traffic from node. Refer to Centos docs for instructions
Created 02-04-2016 03:14 PM
This is the exact root cause
I checked the time on all the nodes. I found that, except the master node, time on all the other nodes were incorrect. So I manually corrected (ntpd was failing to connect to servers) the time on all the nodes.
Do you know why ntpd is failing?
Created 02-04-2016 04:03 PM
Thanks Neeraj. I wish I could update the time using the ntpd, but I tried all commands to update the system time, but I kept getting the error "4 Feb 21:30:55 ntpdate[12169]: no server suitable for synchronization found". I have gone through a lot of materials on internet that discusses about this error, but none of the suggestions helped me, so I thought of doing it manually. Okay. I will check with my company network support and see if it is a problem with firewall, due to which ntpd is not able to sycn with the server.
Created 02-04-2016 05:13 PM
Created 02-04-2016 04:37 PM
Also, I would like to understand, why setting the time manually will not resolve the problem of time synching. If I type "date", it shows me almost the same time on all the nodes now. The dates are same, but the time varies only by a few seconds.
Created 02-06-2016 06:30 PM
a few seconds isn't going to matter. Kerberos and the security system is fussy about clocks.
you can usually set your network switch up as an NTP server, so they can all sync with that. Or turn one of your machines into the NTP server and again, make it a reference source of time. Ideally, if detached from the network, you could hook up a GPS unit and run gpsd to be as accurate as pretty much everything else on the internet
Created 02-05-2016 01:03 PM
Thanks Neeraj for your support. The problem was NTPD. The problem occurred because my nodes could not reach the known ntpd time servers. So I got the address of an internal ntpd server in my company and everything started working fine.