<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Datanode shut down when running Hive in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Datanode-shut-down-when-running-Hive/m-p/24205#M4790</link>
    <description>&lt;P&gt;Thanks for reply,&lt;/P&gt;&lt;P&gt;I got 3 datanodes, the one that shutdown is on master host, this is the information:&lt;/P&gt;&lt;P&gt;00master -&amp;nbsp; block: 342823&amp;nbsp; - block pool used: 53,95GB (6,16%)&lt;/P&gt;&lt;P&gt;01slave&amp;nbsp;&amp;nbsp;&amp;nbsp; -&amp;nbsp; block: 346297&amp;nbsp; - block pool used: 54,38GB (12,46%)&lt;/P&gt;&lt;P&gt;02slave&amp;nbsp;&amp;nbsp;&amp;nbsp; -&amp;nbsp; block: 319262&amp;nbsp; - block pool used: 48,39GB (33,23%)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;and this is my heap setting&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="http://localhost:7180/cmf/services/17/config"&gt;DataNode Default Group / Resource Management &lt;/A&gt;: 186 MB&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="http://localhost:7180/cmf/services/17/config"&gt;DataNode Group 1 / Resource Management &lt;/A&gt;348 MB&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Tu Nguyen&lt;/P&gt;</description>
    <pubDate>Thu, 29 Jan 2015 13:45:33 GMT</pubDate>
    <dc:creator>MabuXayda</dc:creator>
    <dc:date>2015-01-29T13:45:33Z</dc:date>
    <item>
      <title>Datanode shut down when running Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Datanode-shut-down-when-running-Hive/m-p/24133#M4788</link>
      <description>&lt;P&gt;Hi, i'm using CDH5.3&lt;/P&gt;&lt;P&gt;i've got a cluster with 3 host: 1 master host have namenode &amp;amp; datanode, 2 host just have datanode,&lt;/P&gt;&lt;P&gt;Everything run fine till recently when i run a hive Job, the datanode on the master shutdown and i got the error missing block &amp;amp; underreplicated blocks.&lt;/P&gt;&lt;P&gt;Here is the error&amp;nbsp;on the master's datanode:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;3:35:09.545 PM ERROR org.apache.hadoop.hdfs.server.datanode.DirectoryScanner&lt;BR /&gt;Error compiling report&lt;BR /&gt;java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space&lt;BR /&gt;at java.util.concurrent.FutureTask.report(FutureTask.java:122)&lt;BR /&gt;at java.util.concurrent.FutureTask.get(FutureTask.java:188)&lt;BR /&gt;at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:545)&lt;BR /&gt;at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:422)&lt;BR /&gt;at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:403)&lt;BR /&gt;at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:359)&lt;BR /&gt;at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)&lt;BR /&gt;at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)&lt;BR /&gt;at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)&lt;BR /&gt;at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;Caused by: java.lang.OutOfMemoryError: Java heap space&lt;BR /&gt;3:35:09.553 PM INFO org.apache.hadoop.hdfs.server.datanode.DataNode&lt;BR /&gt;opWriteBlock BP-993220972-192.168.0.140-1413974566312:blk_1074414393_678864 received exception java.io.IOException: Premature EOF from inputStream&lt;BR /&gt;3:35:09.553 PM ERROR org.apache.hadoop.hdfs.server.datanode.DirectoryScanner&lt;BR /&gt;Exception during DirectoryScanner execution - will continue next cycle&lt;BR /&gt;java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space&lt;BR /&gt;at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:549)&lt;BR /&gt;at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:422)&lt;BR /&gt;at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:403)&lt;BR /&gt;at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:359)&lt;BR /&gt;at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)&lt;BR /&gt;at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)&lt;BR /&gt;at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)&lt;BR /&gt;at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space&lt;BR /&gt;at java.util.concurrent.FutureTask.report(FutureTask.java:122)&lt;BR /&gt;at java.util.concurrent.FutureTask.get(FutureTask.java:188)&lt;BR /&gt;at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:545)&lt;BR /&gt;... 10 more&lt;BR /&gt;Caused by: java.lang.OutOfMemoryError: Java heap space&lt;BR /&gt;3:35:09.553 PM ERROR org.apache.hadoop.hdfs.server.datanode.DataNode&lt;BR /&gt;00master.mabu.com:50010:DataXceiver error processing WRITE_BLOCK operation src: /192.168.6.10:48911 dst: /192.168.6.10:50010&lt;BR /&gt;java.io.IOException: Premature EOF from inputStream&lt;BR /&gt;at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)&lt;BR /&gt;at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)&lt;BR /&gt;at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)&lt;BR /&gt;at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)&lt;BR /&gt;at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:468)&lt;BR /&gt;at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:772)&lt;BR /&gt;at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:724)&lt;BR /&gt;at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)&lt;BR /&gt;at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)&lt;BR /&gt;at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can someone help me to fix this ? Thanks !&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 09:20:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Datanode-shut-down-when-running-Hive/m-p/24133#M4788</guid>
      <dc:creator>MabuXayda</dc:creator>
      <dc:date>2022-09-16T09:20:25Z</dc:date>
    </item>
    <item>
      <title>Re: Datanode shut down when running Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Datanode-shut-down-when-running-Hive/m-p/24134#M4789</link>
      <description>&amp;gt; java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space&lt;BR /&gt;&lt;BR /&gt;This means the datanode ran out of heap. How many datanodes do you&lt;BR /&gt;have and how many blocks does this one hold? Are all your datanodes&lt;BR /&gt;evenly filled up? What is the heap setting for your datanodes?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 28 Jan 2015 11:19:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Datanode-shut-down-when-running-Hive/m-p/24134#M4789</guid>
      <dc:creator>GautamG</dc:creator>
      <dc:date>2015-01-28T11:19:36Z</dc:date>
    </item>
    <item>
      <title>Re: Datanode shut down when running Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Datanode-shut-down-when-running-Hive/m-p/24205#M4790</link>
      <description>&lt;P&gt;Thanks for reply,&lt;/P&gt;&lt;P&gt;I got 3 datanodes, the one that shutdown is on master host, this is the information:&lt;/P&gt;&lt;P&gt;00master -&amp;nbsp; block: 342823&amp;nbsp; - block pool used: 53,95GB (6,16%)&lt;/P&gt;&lt;P&gt;01slave&amp;nbsp;&amp;nbsp;&amp;nbsp; -&amp;nbsp; block: 346297&amp;nbsp; - block pool used: 54,38GB (12,46%)&lt;/P&gt;&lt;P&gt;02slave&amp;nbsp;&amp;nbsp;&amp;nbsp; -&amp;nbsp; block: 319262&amp;nbsp; - block pool used: 48,39GB (33,23%)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;and this is my heap setting&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="http://localhost:7180/cmf/services/17/config"&gt;DataNode Default Group / Resource Management &lt;/A&gt;: 186 MB&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="http://localhost:7180/cmf/services/17/config"&gt;DataNode Group 1 / Resource Management &lt;/A&gt;348 MB&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Tu Nguyen&lt;/P&gt;</description>
      <pubDate>Thu, 29 Jan 2015 13:45:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Datanode-shut-down-when-running-Hive/m-p/24205#M4790</guid>
      <dc:creator>MabuXayda</dc:creator>
      <dc:date>2015-01-29T13:45:33Z</dc:date>
    </item>
    <item>
      <title>Re: Datanode shut down when running Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Datanode-shut-down-when-running-Hive/m-p/24217#M4791</link>
      <description>Try increasing your datanode heap size. You may need to decrease heaps of other roles to make space, or move roles around so there isn't so much contention for memory on a single host.</description>
      <pubDate>Thu, 29 Jan 2015 19:05:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Datanode-shut-down-when-running-Hive/m-p/24217#M4791</guid>
      <dc:creator>Darren</dc:creator>
      <dc:date>2015-01-29T19:05:55Z</dc:date>
    </item>
    <item>
      <title>Re: Datanode shut down when running Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Datanode-shut-down-when-running-Hive/m-p/24286#M4792</link>
      <description>&lt;P&gt;Thanks for reply,&lt;/P&gt;&lt;P&gt;I've increase the datanode heap size to 1Gb , and my datanode work well so far, but there is one more thing:&lt;/P&gt;&lt;P&gt;I upload data (just using -put command) to my cluster (2736 folder with 200 file each folder (about 15kB each file) ) and my cluster go from 350k up to over 700k blocks each node, then the warning too many block prompted.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I really don't understand why there are so many blocks because the total size of data is just about 5GB.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Tu Nguyen&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 03 Feb 2015 03:00:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Datanode-shut-down-when-running-Hive/m-p/24286#M4792</guid>
      <dc:creator>MabuXayda</dc:creator>
      <dc:date>2015-02-03T03:00:31Z</dc:date>
    </item>
    <item>
      <title>Re: Datanode shut down when running Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Datanode-shut-down-when-running-Hive/m-p/24299#M4793</link>
      <description>&lt;P&gt;Each file uses a minimum of one block entry (though that block will only be the size of the actual data).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So if you are adding 2736 folders each with 200 files that's&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;2736 * 200 = 547,200&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;blocks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Do the folders represent some particular partitioning strategy? Can the files within a particular folder be combined into a single larger file?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Depending on your source data format, you may be better off looking at something like &lt;A target="_blank" href="http://kitesdk.org/docs/0.17.1/Kite-SDK-Guide.html"&gt;Kite&lt;/A&gt;&amp;nbsp;to handle the dataset management for you.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Feb 2015 15:58:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Datanode-shut-down-when-running-Hive/m-p/24299#M4793</guid>
      <dc:creator>busbey</dc:creator>
      <dc:date>2015-02-03T15:58:05Z</dc:date>
    </item>
  </channel>
</rss>

