<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: BDR jobs fail with missing blocks in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79194#M41533</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/19932"&gt;@DanielWhite&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you clarify what you mean by "source server?"&lt;/P&gt;&lt;P&gt;Really, the answer is No.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Your source configuration dictates what NameNode(s) to communicate with.&lt;/P&gt;&lt;P&gt;Your source's NameNodes tell clients where to get blocks. Those blocks can be on any DataNode in the source cluster.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;On the target side where the MapReduce Job runs, the Resource Manager decides on which nodes the Mappers will run.&lt;/P&gt;</description>
    <pubDate>Wed, 29 Aug 2018 17:35:49 GMT</pubDate>
    <dc:creator>bgooley</dc:creator>
    <dc:date>2018-08-29T17:35:49Z</dc:date>
    <item>
      <title>BDR jobs fail with missing blocks</title>
      <link>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79080#M41524</link>
      <description>&lt;P&gt;I have a DR site and run replication from prod to the DR site. My BDR jobs are failing with missing blocks error. The files and blocks that are reported missing are in the source prod system so I'm not sure why the jobs are failing and not copying them over.&lt;/P&gt;</description>
      <pubDate>Mon, 27 Aug 2018 20:24:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79080#M41524</guid>
      <dc:creator>DanielWhite</dc:creator>
      <dc:date>2018-08-27T20:24:55Z</dc:date>
    </item>
    <item>
      <title>Re: BDR jobs fail with missing blocks</title>
      <link>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79081#M41525</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/19932"&gt;@DanielWhite&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;files are split up into blocks and stored on DataNodes.&amp;nbsp; By default, each block is stored on 3 datanoes (block replication factor of 3).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In BDR, the mappers will request blocks (as instructed by the NameNode) from the DataNodes that have them.&amp;nbsp; If no DataNodes contain the blocks, for that file, the file itself cannot be copied.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Recommendation:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Check to see what files have missing blocks in the source cluster and find address the issue.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;BDR/distcp copies files, not individual blocks at this time so if one block of a file is missing from the source, the remaining blocks are not copied.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 27 Aug 2018 20:28:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79081#M41525</guid>
      <dc:creator>bgooley</dc:creator>
      <dc:date>2018-08-27T20:28:24Z</dc:date>
    </item>
    <item>
      <title>Re: BDR jobs fail with missing blocks</title>
      <link>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79082#M41526</link>
      <description>&lt;P&gt;Thanks for your reply.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The files and blocks reported missing by the BDR job running on the DR site do exist on the source system.&lt;/P&gt;</description>
      <pubDate>Mon, 27 Aug 2018 20:35:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79082#M41526</guid>
      <dc:creator>DanielWhite</dc:creator>
      <dc:date>2018-08-27T20:35:11Z</dc:date>
    </item>
    <item>
      <title>Re: BDR jobs fail with missing blocks</title>
      <link>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79084#M41527</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/19932"&gt;@DanielWhite&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I don't quite follow.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you show us the errors or messages you are seeing and some log context around them?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 27 Aug 2018 23:08:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79084#M41527</guid>
      <dc:creator>bgooley</dc:creator>
      <dc:date>2018-08-27T23:08:32Z</dc:date>
    </item>
    <item>
      <title>Re: BDR jobs fail with missing blocks</title>
      <link>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79085#M41528</link>
      <description>&lt;P&gt;Here's the error from the BDR job Running in the DR system. Below that I've run an fsck on the file on the source system to show that it does exist on the source and has the same block number as listed in the error.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've removed ip addresses and I've removed the actual file name and replaced with "&lt;EM&gt;filename&lt;/EM&gt;"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ERROR&amp;nbsp; &lt;STRONG&gt;/path/&lt;/STRONG&gt;&lt;EM&gt;&lt;STRONG&gt;filename&lt;/STRONG&gt;&amp;nbsp;org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1508298398-ipaddress-1406065203774:&lt;STRONG&gt;blk_2079737512_1100628731148&lt;/STRONG&gt; file=&lt;STRONG&gt;filename&lt;/STRONG&gt;&lt;BR /&gt;&amp;nbsp;at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1040)&lt;BR /&gt;&amp;nbsp;at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1023)&lt;BR /&gt;&amp;nbsp;at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1002)&lt;BR /&gt;&amp;nbsp;at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:642)&lt;BR /&gt;&amp;nbsp;at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:895)&lt;BR /&gt;&amp;nbsp;at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:954)&lt;BR /&gt;&amp;nbsp;at java.io.DataInputStream.read(DataInputStream.java:149)&lt;BR /&gt;&amp;nbsp;at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)&lt;BR /&gt;&amp;nbsp;at java.io.BufferedInputStream.read(BufferedInputStream.java:345)&lt;BR /&gt;&amp;nbsp;at java.io.FilterInputStream.read(FilterInputStream.java:107)&lt;BR /&gt;&amp;nbsp;at com.cloudera.enterprise.distcp.util.ThrottledInputStream.read(ThrottledInputStream.java:77)&lt;BR /&gt;&amp;nbsp;at com.cloudera.enterprise.distcp.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:371)&lt;BR /&gt;&amp;nbsp;at com.cloudera.enterprise.distcp.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:345)&lt;BR /&gt;&amp;nbsp;at com.cloudera.enterprise.distcp.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:161)&lt;BR /&gt;&amp;nbsp;at com.cloudera.enterprise.distcp.util.RetriableCommand.execute(RetriableCommand.java:87)&lt;BR /&gt;&amp;nbsp;at com.cloudera.enterprise.distcp.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:617)&lt;BR /&gt;&amp;nbsp;at com.cloudera.enterprise.distcp.mapred.CopyMapper.map(CopyMapper.java:454)&lt;BR /&gt;&amp;nbsp;at com.cloudera.enterprise.distcp.mapred.CopyMapper.map(CopyMapper.java:69)&lt;BR /&gt;&amp;nbsp;at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)&lt;BR /&gt;&amp;nbsp;at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)&lt;BR /&gt;&amp;nbsp;at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)&lt;BR /&gt;&amp;nbsp;at org.apache.hadoop&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here's the file on the source system -&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;hdfs fsck &lt;STRONG&gt;&lt;EM&gt;filename&lt;/EM&gt; &lt;/STRONG&gt;-files -blocks -locations&lt;BR /&gt;Connecting to namenode via http://&lt;STRONG&gt;&lt;EM&gt;prodservername&lt;/EM&gt;&lt;/STRONG&gt;:50070&lt;BR /&gt;FSCK started by hdfs (auth:KERBEROS_SSL) from /&lt;STRONG&gt;&lt;EM&gt;serveripaddress&lt;/EM&gt; &lt;/STRONG&gt;for path &lt;STRONG&gt;&lt;EM&gt;filename&lt;/EM&gt; &lt;/STRONG&gt;at Mon Aug 27 19:39:53 EDT 2018&lt;BR /&gt;&lt;STRONG&gt;&lt;EM&gt;filename&lt;/EM&gt; &lt;/STRONG&gt;995352 bytes, 1 block(s):&amp;nbsp; OK&lt;BR /&gt;0. BP-1508298398-ipaddress-1406065203774:&lt;STRONG&gt;blk_2079737512_1100628731148&lt;/STRONG&gt; len=995352 Live_repl=3 [DatanodeInfoWithStorage[&lt;EM&gt;ipaddress&lt;/EM&gt;:1004,DS-00246250-eef8-4c03-8ef7-c898594f960b,DISK], DatanodeInfoWithStorage[&lt;EM&gt;ipaddress&lt;/EM&gt;:1004,DS-297b0420-a2a1-4418-8691-3ef9a374cc51,DISK], DatanodeInfoWithStorage[ipaddress:1004,DS-0ae9f985-a12a-4871-991b-d2e8017c4c4b,DISK]]&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 27 Aug 2018 23:53:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79085#M41528</guid>
      <dc:creator>DanielWhite</dc:creator>
      <dc:date>2018-08-27T23:53:30Z</dc:date>
    </item>
    <item>
      <title>Re: BDR jobs fail with missing blocks</title>
      <link>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79088#M41529</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/19932"&gt;@DanielWhite&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I got an email showing your update, but for some reason I don't see it here.&lt;/P&gt;&lt;P&gt;What I did notice was that the stack trace says:&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;ERROR&amp;nbsp; /path/filename org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1508298398-10.9.129.86-1406065203774:blk_2079737512_1100628731148 file=filename&lt;BR /&gt;&amp;nbsp;at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The inablity by a client to retrieve blocks results in the BlockMissingException.&amp;nbsp; This might be a bit misleading.&lt;/P&gt;&lt;P&gt;Rather, I'd check to verify that all the DataNodes in the source cluster are accessible during replication and that all the nodes in your destination cluster can connect to DataNodes in the source clusters.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Note that I think there may be more information in the BDR job logs in YARN.&amp;nbsp; It could be that there is a firewall or something else preventing mappers from retrieving blocks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Does the same problem happen every time, somtimes, etc.?&lt;/P&gt;</description>
      <pubDate>Tue, 28 Aug 2018 00:31:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79088#M41529</guid>
      <dc:creator>bgooley</dc:creator>
      <dc:date>2018-08-28T00:31:46Z</dc:date>
    </item>
    <item>
      <title>Re: BDR jobs fail with missing blocks</title>
      <link>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79091#M41530</link>
      <description>&lt;P&gt;You're the hero. I pulled diag data on one of the jobs and found a connection refused when trying to access one of the files with the missing blocks error. I tried connecting to the remote server and can't. I looked at others and some I can connect to others I can't. It's a large inventory of servers to work through.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I really appreciate your help with this. Thank you.&lt;/P&gt;</description>
      <pubDate>Tue, 28 Aug 2018 02:31:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79091#M41530</guid>
      <dc:creator>DanielWhite</dc:creator>
      <dc:date>2018-08-28T02:31:02Z</dc:date>
    </item>
    <item>
      <title>Re: BDR jobs fail with missing blocks</title>
      <link>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79151#M41531</link>
      <description>That's great news! BDR has a lot of moving parts, so it can be super tricky to debug so I hope once you get the connectivity worked out that it is smooth sailing.</description>
      <pubDate>Wed, 29 Aug 2018 04:49:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79151#M41531</guid>
      <dc:creator>bgooley</dc:creator>
      <dc:date>2018-08-29T04:49:45Z</dc:date>
    </item>
    <item>
      <title>Re: BDR jobs fail with missing blocks</title>
      <link>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79171#M41532</link>
      <description>&lt;P&gt;I have a further question&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there a way to have the BDR job connect to a specific source server?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 29 Aug 2018 13:44:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79171#M41532</guid>
      <dc:creator>DanielWhite</dc:creator>
      <dc:date>2018-08-29T13:44:17Z</dc:date>
    </item>
    <item>
      <title>Re: BDR jobs fail with missing blocks</title>
      <link>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79194#M41533</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/19932"&gt;@DanielWhite&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you clarify what you mean by "source server?"&lt;/P&gt;&lt;P&gt;Really, the answer is No.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Your source configuration dictates what NameNode(s) to communicate with.&lt;/P&gt;&lt;P&gt;Your source's NameNodes tell clients where to get blocks. Those blocks can be on any DataNode in the source cluster.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;On the target side where the MapReduce Job runs, the Resource Manager decides on which nodes the Mappers will run.&lt;/P&gt;</description>
      <pubDate>Wed, 29 Aug 2018 17:35:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/BDR-jobs-fail-with-missing-blocks/m-p/79194#M41533</guid>
      <dc:creator>bgooley</dc:creator>
      <dc:date>2018-08-29T17:35:49Z</dc:date>
    </item>
  </channel>
</rss>

