<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: One Data node is down in the cluster in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/One-Data-node-is-down-in-the-cluster/m-p/174281#M37738</link>
    <description>&lt;A rel="user" href="#"&gt;@venkat v&lt;/A&gt;&lt;P&gt;Can you please check logs on that datanode?&lt;/P&gt;&lt;P&gt;Also, run hdfs dfsadmin -report to check whether datanode is really down or ambari gletch ?&lt;/P&gt;</description>
    <pubDate>Sat, 13 Aug 2016 08:15:36 GMT</pubDate>
    <dc:creator>bandarusridhar1</dc:creator>
    <dc:date>2016-08-13T08:15:36Z</dc:date>
    <item>
      <title>One Data node is down in the cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/One-Data-node-is-down-in-the-cluster/m-p/174279#M37736</link>
      <description>&lt;P&gt;I have created 5 node cluster on AWS, in the one of the DataNode is showing as down in Ambari. I have loging to node and run the ambari-agent status, it is showing as running. Please help me out to resolve the issue&lt;/P&gt;</description>
      <pubDate>Sat, 13 Aug 2016 06:14:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/One-Data-node-is-down-in-the-cluster/m-p/174279#M37736</guid>
      <dc:creator>uses_venkatesh</dc:creator>
      <dc:date>2016-08-13T06:14:22Z</dc:date>
    </item>
    <item>
      <title>Re: One Data node is down in the cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/One-Data-node-is-down-in-the-cluster/m-p/174280#M37737</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/12484/usesvenkatesh.html" nodeid="12484"&gt;@venkat v&lt;/A&gt;&lt;P&gt;Can you please share more information? When you say one node is down, does that mean data node process? Can we see hdfs logs from /var/log folder? Is the node able to talk to Ambari? What kind of instance is this? Some low end instances share network bandwidth and other resources from applications other than yours. Those applications at times, may be using resources and impacting your system. If that's the case, it will show up and working as soon as resources become available. Is it possible to restart the node? I know in AWS it's not a simple decision like on-prem cluster but sometimes, that might be it.&lt;/P&gt;</description>
      <pubDate>Sat, 13 Aug 2016 08:11:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/One-Data-node-is-down-in-the-cluster/m-p/174280#M37737</guid>
      <dc:creator>mqureshi</dc:creator>
      <dc:date>2016-08-13T08:11:56Z</dc:date>
    </item>
    <item>
      <title>Re: One Data node is down in the cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/One-Data-node-is-down-in-the-cluster/m-p/174281#M37738</link>
      <description>&lt;A rel="user" href="#"&gt;@venkat v&lt;/A&gt;&lt;P&gt;Can you please check logs on that datanode?&lt;/P&gt;&lt;P&gt;Also, run hdfs dfsadmin -report to check whether datanode is really down or ambari gletch ?&lt;/P&gt;</description>
      <pubDate>Sat, 13 Aug 2016 08:15:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/One-Data-node-is-down-in-the-cluster/m-p/174281#M37738</guid>
      <dc:creator>bandarusridhar1</dc:creator>
      <dc:date>2016-08-13T08:15:36Z</dc:date>
    </item>
    <item>
      <title>Re: One Data node is down in the cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/One-Data-node-is-down-in-the-cluster/m-p/174282#M37739</link>
      <description>&lt;P&gt;Check with HDFS user &lt;/P&gt;&lt;P&gt;$sudo su - hdfs &lt;/P&gt;&lt;P&gt;$hadoop dfsadmin - report (from out put verify the list datanodes available)&lt;/P&gt;&lt;P&gt;which node is is listing or not verify &lt;/P&gt;&lt;P&gt;go to log directory &lt;/P&gt;&lt;P&gt;#cd /var/log/hdfs/datanode.log&lt;/P&gt;&lt;P&gt;It will give you some more information of issue.&lt;/P&gt;&lt;P&gt;if help full your comment and accept are appreciated &lt;/P&gt;</description>
      <pubDate>Sun, 14 Aug 2016 00:40:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/One-Data-node-is-down-in-the-cluster/m-p/174282#M37739</guid>
      <dc:creator>shivkumar82015</dc:creator>
      <dc:date>2016-08-14T00:40:20Z</dc:date>
    </item>
    <item>
      <title>Re: One Data node is down in the cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/One-Data-node-is-down-in-the-cluster/m-p/174283#M37740</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;DIV&gt;&lt;P&gt;I have logged into the serve and run the hdfs dfsadmin -report, below is the output&lt;/P&gt;&lt;P&gt;[centos@ip-172-31-9-98 ~]$ sudo su hdfs&lt;/P&gt;&lt;P&gt;[hdfs@ip-172-31-9-98 centos]$ hdfs dfsadmin -report&lt;/P&gt;&lt;P&gt;Configured Capacity: 28984442880 (26.99 GB)&lt;/P&gt;&lt;P&gt;Present Capacity: 7856140288 (7.32 GB)&lt;/P&gt;&lt;P&gt;DFS Remaining: 5172633600 (4.82 GB)&lt;/P&gt;&lt;P&gt;DFS Used: 2683506688 (2.50 GB)&lt;/P&gt;&lt;P&gt;DFS Used%: 34.16%&lt;/P&gt;&lt;P&gt;Under replicated blocks: 0&lt;/P&gt;&lt;P&gt;Blocks with corrupt replicas: 0&lt;/P&gt;&lt;P&gt;Missing blocks: 0&lt;/P&gt;&lt;P&gt;Missing blocks (with replication factor 1): 0&lt;/P&gt;&lt;P&gt;-------------------------------------------------&lt;/P&gt;&lt;P&gt;Live datanodes (4):&lt;/P&gt;&lt;P&gt;Name: 172.31.58.15:50010 (ip-172-31-58-15.ec2.internal)&lt;/P&gt;&lt;P&gt;Hostname: ip-172-31-58-15.ec2.internal&lt;/P&gt;&lt;P&gt;Decommission Status : Normal&lt;/P&gt;&lt;P&gt;Configured Capacity: 7246110720 (6.75 GB)&lt;/P&gt;&lt;P&gt;DFS Used: 585097216 (557.99 MB)&lt;/P&gt;&lt;P&gt;Non DFS Used: 3310796800 (3.08 GB)&lt;/P&gt;&lt;P&gt;DFS Remaining: 3350216704 (3.12 GB)&lt;/P&gt;&lt;P&gt;DFS Used%: 8.07%&lt;/P&gt;&lt;P&gt;DFS Remaining%: 46.23%&lt;/P&gt;&lt;P&gt;Configured Cache Capacity: 0 (0 B)&lt;/P&gt;&lt;P&gt;Cache Used: 0 (0 B)&lt;/P&gt;&lt;P&gt;Cache Remaining: 0 (0 B)&lt;/P&gt;&lt;P&gt;Cache Used%: 100.00%&lt;/P&gt;&lt;P&gt;Cache Remaining%: 0.00%&lt;/P&gt;&lt;P&gt;Xceivers: 2&lt;/P&gt;&lt;P&gt;Last contact: Mon Aug 15 01:45:33 UTC 2016&lt;/P&gt;&lt;P&gt;Name: 172.31.6.230:50010 (ip-172-31-6-230.ec2.internal)&lt;/P&gt;&lt;P&gt;Hostname: ip-172-31-6-230.ec2.internal&lt;/P&gt;&lt;P&gt;Decommission Status : Normal&lt;/P&gt;&lt;P&gt;Configured Capacity: 7246110720 (6.75 GB)&lt;/P&gt;&lt;P&gt;DFS Used: 894488576 (853.05 MB)&lt;/P&gt;&lt;P&gt;Non DFS Used: 6351622144 (5.92 GB)&lt;/P&gt;&lt;P&gt;DFS Remaining: 0 (0 B)&lt;/P&gt;&lt;P&gt;DFS Used%: 12.34%&lt;/P&gt;&lt;P&gt;DFS Remaining%: 0.00%&lt;/P&gt;&lt;P&gt;Configured Cache Capacity: 0 (0 B)&lt;/P&gt;&lt;P&gt;Cache Used: 0 (0 B)&lt;/P&gt;&lt;P&gt;Cache Remaining: 0 (0 B)&lt;/P&gt;&lt;P&gt;Cache Used%: 100.00%&lt;/P&gt;&lt;P&gt;Cache Remaining%: 0.00%&lt;/P&gt;&lt;P&gt;Xceivers: 2&lt;/P&gt;&lt;P&gt;Last contact: Mon Aug 15 01:45:31 UTC 2016&lt;/P&gt;&lt;P&gt;Name: 172.31.9.97:50010 (ip-172-31-9-97.ec2.internal)&lt;/P&gt;&lt;P&gt;Hostname: ip-172-31-9-97.ec2.internal&lt;/P&gt;&lt;P&gt;Decommission Status : Normal&lt;/P&gt;&lt;P&gt;Configured Capacity: 7246110720 (6.75 GB)&lt;/P&gt;&lt;P&gt;DFS Used: 894484480 (853.05 MB)&lt;/P&gt;&lt;P&gt;Non DFS Used: 5936037888 (5.53 GB)&lt;/P&gt;&lt;P&gt;DFS Remaining: 415588352 (396.34 MB)&lt;/P&gt;&lt;P&gt;DFS Used%: 12.34%&lt;/P&gt;&lt;P&gt;DFS Remaining%: 5.74%&lt;/P&gt;&lt;P&gt;Configured Cache Capacity: 0 (0 B)&lt;/P&gt;&lt;P&gt;Cache Used: 0 (0 B)&lt;/P&gt;&lt;P&gt;Cache Remaining: 0 (0 B)&lt;/P&gt;&lt;P&gt;Cache Used%: 100.00%&lt;/P&gt;&lt;P&gt;Cache Remaining%: 0.00%&lt;/P&gt;&lt;P&gt;Xceivers: 2&lt;/P&gt;&lt;P&gt;Last contact: Mon Aug 15 01:45:32 UTC 2016&lt;/P&gt;&lt;P&gt;Name: 172.31.58.16:50010 (ip-172-31-58-16.ec2.internal)&lt;/P&gt;&lt;P&gt;Hostname: ip-172-31-58-16.ec2.internal&lt;/P&gt;&lt;P&gt;Decommission Status : Normal&lt;/P&gt;&lt;P&gt;Configured Capacity: 7246110720 (6.75 GB)&lt;/P&gt;&lt;P&gt;DFS Used: 309436416 (295.10 MB)&lt;/P&gt;&lt;P&gt;Non DFS Used: 5529845760 (5.15 GB)&lt;/P&gt;&lt;P&gt;DFS Remaining: 1406828544 (1.31 GB)&lt;/P&gt;&lt;P&gt;DFS Used%: 4.27%&lt;/P&gt;&lt;P&gt;DFS Remaining%: 19.41%&lt;/P&gt;&lt;P&gt;Configured Cache Capacity: 0 (0 B)&lt;/P&gt;&lt;P&gt;Cache Used: 0 (0 B)&lt;/P&gt;&lt;P&gt;Cache Remaining: 0 (0 B)&lt;/P&gt;&lt;P&gt;Cache Used%: 100.00%&lt;/P&gt;&lt;P&gt;Cache Remaining%: 0.00%&lt;/P&gt;&lt;P&gt;Xceivers: 2&lt;/P&gt;&lt;P&gt;Last contact: Mon Aug 15 01:45:33 UTC 2016&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;I have Gone through the HDFS logs in the server, It is showing I/O Exception(Jave), Below is the log out put, please go through it and Help me out.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;[hdfs@ip-172-31-9-98 centos]$ cd /var/lo&lt;/P&gt;&lt;P&gt;local/ lock/ log/&lt;/P&gt;&lt;P&gt;[hdfs@ip-172-31-9-98 centos]$ cd /var/log/&lt;/P&gt;&lt;P&gt;[hdfs@ip-172-31-9-98 log]$ ls&lt;/P&gt;&lt;P&gt;ambari-agent cron-20160814 messages-20160807&lt;/P&gt;&lt;P&gt;ambari-metrics-collectorcups messages-20160814&lt;/P&gt;&lt;P&gt;ambari-metrics-monitor dmesg oozie&lt;/P&gt;&lt;P&gt;anaconda.ifcfg.log dmesg.old secure&lt;/P&gt;&lt;P&gt;anaconda.log dracut.log secure-20160807&lt;/P&gt;&lt;P&gt;anaconda.program.log falcon secure-20160814&lt;/P&gt;&lt;P&gt;anaconda.storage.log hadoopspark&lt;/P&gt;&lt;P&gt;anaconda.syslog hadoop-mapreduce spooler&lt;/P&gt;&lt;P&gt;anaconda.yum.log hadoop-yarn spooler-20160807&lt;/P&gt;&lt;P&gt;audithive spooler-20160814&lt;/P&gt;&lt;P&gt;boot.loghive-hcatalog tallylog&lt;/P&gt;&lt;P&gt;btmp lastlog wtmp&lt;/P&gt;&lt;P&gt;cloud-init.log maillog yum.log&lt;/P&gt;&lt;P&gt;cloud-init-output.log maillog-20160807 zookeeper&lt;/P&gt;&lt;P&gt;cron maillog-20160814&lt;/P&gt;&lt;P&gt;cron-20160807 messages&lt;/P&gt;&lt;P&gt;[hdfs@ip-172-31-9-98 log]$ cd hadoop&lt;/P&gt;&lt;P&gt;[hdfs@ip-172-31-9-98 hadoop]$ ls&lt;/P&gt;&lt;P&gt;hdfsmapreducerootyarn&lt;/P&gt;&lt;P&gt;[hdfs@ip-172-31-9-98 hadoop]$ cd hdfs/&lt;/P&gt;&lt;P&gt;[hdfs@ip-172-31-9-98 hdfs]$ ls&lt;/P&gt;&lt;P&gt;gc.log-201608031630&lt;/P&gt;&lt;P&gt;gc.log-201608031641&lt;/P&gt;&lt;P&gt;gc.log-201608081832&lt;/P&gt;&lt;P&gt;gc.log-201608082306&lt;/P&gt;&lt;P&gt;gc.log-201608141850&lt;/P&gt;&lt;P&gt;hadoop-hdfs-datanode-ip-172-31-9-98.ec2.internal.log&lt;/P&gt;&lt;P&gt;hadoop-hdfs-datanode-ip-172-31-9-98.ec2.internal.out&lt;/P&gt;&lt;P&gt;hadoop-hdfs-datanode-ip-172-31-9-98.ec2.internal.out.1&lt;/P&gt;&lt;P&gt;hadoop-hdfs-datanode-ip-172-31-9-98.ec2.internal.out.2&lt;/P&gt;&lt;P&gt;hadoop-hdfs-datanode-ip-172-31-9-98.ec2.internal.out.3&lt;/P&gt;&lt;P&gt;hadoop-hdfs-datanode-ip-172-31-9-98.ec2.internal.out.4&lt;/P&gt;&lt;P&gt;hdfs-audit.log&lt;/P&gt;&lt;P&gt;SecurityAuth.audit&lt;/P&gt;&lt;P&gt;[hdfs@ip-172-31-9-98 hdfs]$ tail -100f hadoop-hdfs-datanode-ip-172-31-9-98.ec2.internal.log&lt;/P&gt;&lt;P&gt;2016-08-08 18:39:26,009 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(214)) - Unable to send metrics to collector by address:http://ip-172-31-9-98.ec2.internal:6188/ws/v1/timeline/metrics&lt;/P&gt;&lt;P&gt;2016-08-08 18:40:26,008 INFO httpclient.HttpMethodDirector (HttpMethodDirector.java:executeWithRetry(439)) - I/O exception (java.net.ConnectException) caught when processing request: Connection refused&lt;/P&gt;&lt;P&gt;2016-08-08 18:40:26,008 INFO httpclient.HttpMethodDirector (HttpMethodDirector.java:executeWithRetry(445)) - Retrying request&lt;/P&gt;&lt;P&gt;2016-08-08 18:40:26,008 INFO httpclient.HttpMethodDirector (HttpMethodDirector.java:executeWithRetry(439)) - I/O exception (java.net.ConnectException) caught when processing request: Connection refused&lt;/P&gt;&lt;P&gt;2016-08-08 18:40:26,008 INFO httpclient.HttpMethodDirector (HttpMethodDirector.java:executeWithRetry(445)) - Retrying request&lt;/P&gt;&lt;P&gt;2016-08-08 18:40:26,008 INFO httpclient.HttpMethodDirector (HttpMethodDirector.java:executeWithRetry(439)) - I/O exception (java.net.ConnectException) caught when processing request: Connection refused&lt;/P&gt;&lt;P&gt;2016-08-08 18:40:26,009 INFO httpclient.HttpMethodDirector (HttpMethodDirector.java:executeWithRetry(445)) - Retrying request&lt;/P&gt;&lt;P&gt;2016-08-08 18:40:26,009 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(214)) - Unable to send metrics to collector by address:http://ip-172-31-9-98.ec2.internal:6188/ws/v1/timeline/metrics&lt;/P&gt;&lt;P&gt;2016-08-08 18:40:26,009 INFO httpclient.HttpMethodDirector (HttpMethodDirector.java:executeWithRetry(439)) - I/O exception (java.net.ConnectException) caught when processing request: Connection refused&lt;/P&gt;&lt;P&gt;2016-08-08 18:40:26,009 INFO httpclient.HttpMethodDirector (HttpMethodDirector.java:executeWithRetry(445)) - Retrying request&lt;/P&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 15 Aug 2016 10:08:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/One-Data-node-is-down-in-the-cluster/m-p/174283#M37740</guid>
      <dc:creator>uses_venkatesh</dc:creator>
      <dc:date>2016-08-15T10:08:24Z</dc:date>
    </item>
    <item>
      <title>Re: One Data node is down in the cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/One-Data-node-is-down-in-the-cluster/m-p/174284#M37741</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/12484/usesvenkatesh.html" nodeid="12484"&gt;@venkat v&lt;/A&gt; &lt;/P&gt;&lt;P&gt;What type of instances are these? It seems like a simple connection issue. This might just be because of the lower end instances being used.&lt;/P&gt;&lt;P&gt;Is this the data node that's down?&lt;/P&gt;&lt;P&gt;&lt;A href="http://ip-172-31-9-98.ec2.internal:6188/ws/v1/timeline/metrics" target="_blank"&gt;http://ip-172-31-9-98.ec2.internal:6188/ws/v1/timeline/metrics&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 15 Aug 2016 12:19:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/One-Data-node-is-down-in-the-cluster/m-p/174284#M37741</guid>
      <dc:creator>mqureshi</dc:creator>
      <dc:date>2016-08-15T12:19:04Z</dc:date>
    </item>
  </channel>
</rss>

