<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Getting error &amp;quot;Exception in doCheckpoint java.io.IOException: Unable to download to any storage directory&amp;quot; in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Getting-error-quot-Exception-in-doCheckpoint-java-io/m-p/169057#M131371</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3120/gsharma.html" nodeid="3120"&gt;@gsharma&lt;/A&gt; - By looking at the error it looks like some problem with secondary namenode's local storage.&lt;/P&gt;&lt;P&gt;can you please check value of &lt;STRONG&gt;dfs.namenode.checkpoint.dir &lt;/STRONG&gt;and see if any issues like RO mount or storage full or bad disk maybe?&lt;/P&gt;&lt;P&gt;Also, undereplicated block issue is not related to this one.&lt;/P&gt;&lt;P&gt;How many datanodes you have? what is the replication factor? are all the datanodes healthy ?&lt;/P&gt;</description>
    <pubDate>Thu, 14 Apr 2016 14:31:13 GMT</pubDate>
    <dc:creator>KuldeepK</dc:creator>
    <dc:date>2016-04-14T14:31:13Z</dc:date>
    <item>
      <title>Getting error "Exception in doCheckpoint java.io.IOException: Unable to download to any storage directory"</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Getting-error-quot-Exception-in-doCheckpoint-java-io/m-p/169056#M131370</link>
      <description>&lt;P&gt;Hello All,&lt;/P&gt;&lt;P&gt;Below are the errors seen in secondary namenode, there are about 6000 under replicated blocks too, not sure if its related to this issue only. DN health is fine. Appreciate any pointers.&lt;/P&gt;&lt;P&gt;==&lt;/P&gt;&lt;PRE&gt;2016-04-12 15:45:03,660 INFO  namenode.SecondaryNameNode (SecondaryNameNode.java:run(453)) - Image has not changed. Will not download image.
2016-04-12 15:45:03,661 INFO  namenode.TransferFsImage (TransferFsImage.java:getFileClient(394)) - Opening connection to &lt;A href="http://ey9omprna005.vzbi.com:50070/imagetransfer?getedit=1&amp;amp;startTxId=236059442&amp;amp;endTxId=23608" target="_blank"&gt;http://ey9omprna005.vzbi.com:50070/imagetransfer?getedit=1&amp;amp;startTxId=236059442&amp;amp;endTxId=23608&lt;/A&gt;.
2016-04-12 15:45:03,665 ERROR namenode.SecondaryNameNode (SecondaryNameNode.java:doWork(399)) - Exception in doCheckpoint
java.io.IOException: Unable to download to any storage directory
&lt;/PRE&gt;</description>
      <pubDate>Thu, 14 Apr 2016 14:02:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Getting-error-quot-Exception-in-doCheckpoint-java-io/m-p/169056#M131370</guid>
      <dc:creator>gsharma</dc:creator>
      <dc:date>2016-04-14T14:02:09Z</dc:date>
    </item>
    <item>
      <title>Re: Getting error "Exception in doCheckpoint java.io.IOException: Unable to download to any storage directory"</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Getting-error-quot-Exception-in-doCheckpoint-java-io/m-p/169057#M131371</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3120/gsharma.html" nodeid="3120"&gt;@gsharma&lt;/A&gt; - By looking at the error it looks like some problem with secondary namenode's local storage.&lt;/P&gt;&lt;P&gt;can you please check value of &lt;STRONG&gt;dfs.namenode.checkpoint.dir &lt;/STRONG&gt;and see if any issues like RO mount or storage full or bad disk maybe?&lt;/P&gt;&lt;P&gt;Also, undereplicated block issue is not related to this one.&lt;/P&gt;&lt;P&gt;How many datanodes you have? what is the replication factor? are all the datanodes healthy ?&lt;/P&gt;</description>
      <pubDate>Thu, 14 Apr 2016 14:31:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Getting-error-quot-Exception-in-doCheckpoint-java-io/m-p/169057#M131371</guid>
      <dc:creator>KuldeepK</dc:creator>
      <dc:date>2016-04-14T14:31:13Z</dc:date>
    </item>
    <item>
      <title>Re: Getting error "Exception in doCheckpoint java.io.IOException: Unable to download to any storage directory"</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Getting-error-quot-Exception-in-doCheckpoint-java-io/m-p/169058#M131372</link>
      <description>&lt;P&gt;Did you tried restarting secondary namnode? if not then first we would try to do restart.&lt;/P&gt;</description>
      <pubDate>Thu, 14 Apr 2016 19:18:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Getting-error-quot-Exception-in-doCheckpoint-java-io/m-p/169058#M131372</guid>
      <dc:creator>jyadav</dc:creator>
      <dc:date>2016-04-14T19:18:27Z</dc:date>
    </item>
    <item>
      <title>Re: Getting error "Exception in doCheckpoint java.io.IOException: Unable to download to any storage directory"</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Getting-error-quot-Exception-in-doCheckpoint-java-io/m-p/169059#M131373</link>
      <description>&lt;P&gt;No Jitendra , not yet tried since its prod env.  , how about first trying to force a manual checkpoint rather than reboot  ? Need your suggestion from applying this action plan in prod environment perspective ? &lt;/P&gt;</description>
      <pubDate>Fri, 15 Apr 2016 12:21:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Getting-error-quot-Exception-in-doCheckpoint-java-io/m-p/169059#M131373</guid>
      <dc:creator>gsharma</dc:creator>
      <dc:date>2016-04-15T12:21:23Z</dc:date>
    </item>
    <item>
      <title>Re: Getting error "Exception in doCheckpoint java.io.IOException: Unable to download to any storage directory"</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Getting-error-quot-Exception-in-doCheckpoint-java-io/m-p/169060#M131374</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/504/kkulkarni.html" nodeid="504"&gt;@Kuldeep Kulkarni&lt;/A&gt; Here is my response to your queries.&lt;/P&gt;&lt;P&gt;1. checked , no mount points are read only. &lt;/P&gt;&lt;P&gt;2. Check df -h on both nodes, no space issues. &lt;/P&gt;&lt;P&gt;3. we have 5 DNs , RF is 3 , &lt;/P&gt;&lt;P&gt;4. All datanodes seem to be healthy with only 50 % of DFS utilization.&lt;/P&gt;&lt;P&gt;==&lt;/P&gt;&lt;P&gt;Here is my investigation so far.&lt;/P&gt;&lt;P&gt;&amp;gt;&amp;gt; The last successful fsimage on NN is on April 5th &lt;/P&gt;&lt;P&gt;----&lt;/P&gt;&lt;P&gt;-rw-r--r-- 1 hdfs hadoop 411408570 Apr 5 20:11 fsimage_0000000000236059441 &lt;/P&gt;&lt;P&gt;-----&lt;/P&gt;&lt;P&gt;&amp;gt;&amp;gt; Before the above, the last checkpoint file goes back to Feb. About a gap of 40 + days. &lt;/P&gt;&lt;P&gt;-----&lt;/P&gt;&lt;P&gt;-rw-r--r-- 1 hdfs hadoop 144021898 Feb 24 2015 fsimage.ckpt_0000000000039014860 &lt;/P&gt;&lt;P&gt;----&lt;/P&gt;&lt;P&gt;&amp;gt;&amp;gt; The same fsimage reaches secondary namenode at :- &lt;/P&gt;&lt;P&gt;----&lt;/P&gt;&lt;P&gt;-rw-r--r-- 1 hdfs hadoop 411408570 Apr 5 22:11 fsimage_0000000000236059441 &lt;/P&gt;&lt;P&gt;-----&lt;/P&gt;&lt;P&gt;Now secondary namenode merges edit with recently acquired fsimage and creates a new fsimage to be fetched by primary NN &lt;/P&gt;&lt;P&gt;-----&lt;/P&gt;&lt;P&gt;-rw-r--r-- 1 hdfs hadoop 42688512 Apr 6 10:12 fsimage.ckpt_0000000000236367214 &lt;/P&gt;&lt;P&gt;-----&lt;/P&gt;&lt;P&gt;&amp;gt;&amp;gt; No transactions are visible either on Namenode or on Secondary after that. &lt;/P&gt;&lt;P&gt;===&lt;/P&gt;&lt;P&gt;On secondary NN hadoop-hdfs-secondarynamenode-xxx-yy.out , I can see below errors&lt;/P&gt;&lt;P&gt;===&lt;/P&gt;&lt;P&gt;java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.receiveFile(TransferFsImage.java:517)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.doGetUrl(TransferFsImage.java:431)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:395)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.downloadEditsToStorage(TransferFsImage.java:167)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:465)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:444)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointFiles(SecondaryNameNode.java:443)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:540)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:395)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:361)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:357)
at java.lang.Thread.run(Thread.java:745)
log4j:ERROR Failed to flush writer,&lt;/P&gt;&lt;P&gt;====&lt;/P&gt;&lt;P&gt;And below error comes after the above : - &lt;/P&gt;&lt;P&gt;====&lt;/P&gt;&lt;P&gt;java.io.IOException: Unable to download to any storage directory
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.receiveFile(TransferFsImage.java:505)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.doGetUrl(TransferFsImage.java:431)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:395)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.downloadEditsToStorage(TransferFsImage.java:167)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:465)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:444)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointFiles(SecondaryNameNode.java:443)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:540)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:395)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:361)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:357)
at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;=====&lt;/P&gt;&lt;P&gt;Now, apart from our investigation, wanted to clarify if&lt;/P&gt;&lt;P&gt;1. no space left on device error comes from primary NN when it tries to fetch the fsimage from SNN ? Or it comes SNN itself to not able to download the old fsimage it gets from PNN.&lt;/P&gt;&lt;P&gt;There are no timestamps in .out file so cant actually serialize the issue / patterns / errors. &lt;/P&gt;</description>
      <pubDate>Fri, 15 Apr 2016 12:33:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Getting-error-quot-Exception-in-doCheckpoint-java-io/m-p/169060#M131374</guid>
      <dc:creator>gsharma</dc:creator>
      <dc:date>2016-04-15T12:33:54Z</dc:date>
    </item>
    <item>
      <title>Re: Getting error "Exception in doCheckpoint java.io.IOException: Unable to download to any storage directory"</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Getting-error-quot-Exception-in-doCheckpoint-java-io/m-p/169061#M131375</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1791/gauravsharma.html" nodeid="1791"&gt;@gaurav sharma&lt;/A&gt; - If you look at logs carefully, I noticed below message&lt;/P&gt;&lt;PRE&gt;&lt;EM&gt;java.io.IOException: &lt;STRONG&gt;No space left on device&lt;/STRONG&gt; at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.receiveFile(TransferFsImage.java:517) at&lt;/EM&gt;&lt;/PRE&gt;&lt;P&gt;1. Can you please move existing fsimage from SNN to some other location, make sure that disk on SNN has capacity to store fsimage from NN ( check size of fsimage on NN and see if total disk capacity on SNN sufficient to store fsimage )&lt;/P&gt;&lt;P&gt;2. Shutdown Secondary NN&lt;/P&gt;&lt;P&gt;3. Run below command to force secondary NN to do checkpointing &lt;/P&gt;&lt;PRE&gt;hadoop secondarynamenode -checkpoint force&lt;/PRE&gt;&lt;P&gt;Note - Please run above command by hdfs user.&lt;/P&gt;</description>
      <pubDate>Fri, 15 Apr 2016 13:21:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Getting-error-quot-Exception-in-doCheckpoint-java-io/m-p/169061#M131375</guid>
      <dc:creator>KuldeepK</dc:creator>
      <dc:date>2016-04-15T13:21:59Z</dc:date>
    </item>
    <item>
      <title>Re: Getting error "Exception in doCheckpoint java.io.IOException: Unable to download to any storage directory"</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Getting-error-quot-Exception-in-doCheckpoint-java-io/m-p/169062#M131376</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/3120/gsharma.html" nodeid="3120"&gt;@gsharma&lt;/A&gt;&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3120/gsharma.html" nodeid="3120"&gt;&lt;/A&gt;Yes ,you can try forcing checkpoint first but I doubt if this works, also can you check whether you have sufficient local disk space of SNN node plus on hdfs also? if disk space is not a prob then we can restart SNN since it will not cause any issue to PNN as well as on running jobs.&lt;/P&gt;</description>
      <pubDate>Fri, 15 Apr 2016 18:39:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Getting-error-quot-Exception-in-doCheckpoint-java-io/m-p/169062#M131376</guid>
      <dc:creator>jyadav</dc:creator>
      <dc:date>2016-04-15T18:39:08Z</dc:date>
    </item>
  </channel>
</rss>

