Created 03-03-2016 04:14 PM
We tried running teragen in a 5 node cluster, this time using Hadoop 2.7.1. The task is stuck at map 50%, reduce 0%.
When viewed the logs for this job on a datanode it showed this error :-
ntainerLauncher #1] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:57252. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-03-03 21:33:12,426 INFO [ContainerLauncher #1] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:57252. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-03-03 21:33:13,430 INFO [ContainerLauncher #1] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:57252. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-03-03 21:33:14,437 INFO [ContainerLauncher #1] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:57252. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-03-03 21:33:15,447 INFO [ContainerLauncher #1] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:57252. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-03-03 21:33:16,448 INFO [ContainerLauncher #1] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:57252. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-03-03 21:33:27,450 INFO [ContainerLauncher #1] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:57252. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-03-03 21:33:28,452 INFO [ContainerLauncher #1] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:57252. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-03-03 21:33:29,456 INFO [ContainerLauncher #1] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:57252. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-03-03 21:33:30,463 INFO [ContainerLauncher #1] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:57252. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-03-03 21:33:31,465 INFO [ContainerLauncher #1] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:57252. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-03-03 21:33:32,476 INFO [ContainerLauncher #1] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:57252. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-03-03 21:33:33,485 INFO [ContainerLauncher #1] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:57252. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-03-03 21:33:34,493 INFO [ContainerLauncher #1] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:57252. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-03-03 21:33:35,493 INFO [ContainerLauncher #1] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:57252. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-03-03 21:33:36,494 INFO [ContainerLauncher #1] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:57252. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
the yarn-site.xml on all datanode is as follows :-
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>hadoop-master:8030</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>hadoop-master:8032</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>hadoop-master:8088</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>hadoop-master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>hadoop-master:8033</value> </property> </configuration>
Created 03-04-2016 02:51 AM
Your DataNode cannot connect to your Name Node:
So either:
Created 04-14-2017 04:55 PM
You can try to disable your firewall. Below commands should help:
service iptables stop
service ip6tables stop