<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hadoop read IO size in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hadoop-read-IO-size/m-p/31646#M7240</link>
    <description>thank you for your reply!&lt;BR /&gt;just for clarify&lt;BR /&gt;&amp;gt; stream the data via a buffered read&lt;BR /&gt;does size of this buffer defined by io.file.buffer.size parameter?&lt;BR /&gt;&lt;BR /&gt;thanks!</description>
    <pubDate>Wed, 09 Sep 2015 17:01:49 GMT</pubDate>
    <dc:creator>fil</dc:creator>
    <dc:date>2015-09-09T17:01:49Z</dc:date>
    <item>
      <title>Hadoop read IO size</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hadoop-read-IO-size/m-p/31628#M7238</link>
      <description>&lt;P&gt;Hi dear experts!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;i'm curious how it possible to handle read IO size in my MR jobs.&lt;/P&gt;&lt;P&gt;for exampe, i have some file in HDFS, under the hood it's files in Linux filesystem /disk1/hadoop/.../.../blkXXX.&lt;/P&gt;&lt;P&gt;in ideal case this file size should be equal block size (128-256MB).&lt;/P&gt;&lt;P&gt;my question is how it possible to set IO size for reading operation?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thank you!&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 09:40:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hadoop-read-IO-size/m-p/31628#M7238</guid>
      <dc:creator>fil</dc:creator>
      <dc:date>2022-09-16T09:40:13Z</dc:date>
    </item>
    <item>
      <title>Re: Hadoop read IO size</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hadoop-read-IO-size/m-p/31630#M7239</link>
      <description>Jobs typically read records - not entire blocks. Is your MR job doing anything different in this regard?&lt;BR /&gt;&lt;BR /&gt;Note that HDFS Readers do not read whole blocks of data at a time, and instead stream the data via a buffered read (64k-128k typically). That the block size is X MB does not translate into a memory requirement unless you are explicitly storing the entire block in memory when streaming the read.</description>
      <pubDate>Wed, 09 Sep 2015 04:28:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hadoop-read-IO-size/m-p/31630#M7239</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2015-09-09T04:28:31Z</dc:date>
    </item>
    <item>
      <title>Re: Hadoop read IO size</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hadoop-read-IO-size/m-p/31646#M7240</link>
      <description>thank you for your reply!&lt;BR /&gt;just for clarify&lt;BR /&gt;&amp;gt; stream the data via a buffered read&lt;BR /&gt;does size of this buffer defined by io.file.buffer.size parameter?&lt;BR /&gt;&lt;BR /&gt;thanks!</description>
      <pubDate>Wed, 09 Sep 2015 17:01:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hadoop-read-IO-size/m-p/31646#M7240</guid>
      <dc:creator>fil</dc:creator>
      <dc:date>2015-09-09T17:01:49Z</dc:date>
    </item>
    <item>
      <title>Re: Hadoop read IO size</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hadoop-read-IO-size/m-p/31660#M7241</link>
      <description>The reader buffer size is indeed controlled by that property&lt;BR /&gt;(io.file.buffer.size) but note that if you're doing short circuited reads&lt;BR /&gt;then another property that also applies is&lt;BR /&gt;(dfs.client.read.shortcircuit.buffer.size, 1 MB in bytes by default).&lt;BR /&gt;</description>
      <pubDate>Wed, 09 Sep 2015 23:44:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hadoop-read-IO-size/m-p/31660#M7241</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2015-09-09T23:44:20Z</dc:date>
    </item>
    <item>
      <title>Re: Hadoop read IO size</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hadoop-read-IO-size/m-p/31663#M7242</link>
      <description>thank you for your reply!&lt;BR /&gt;Could you point me at source class where it's possible to read this in more details?&lt;BR /&gt;&lt;BR /&gt;thanks!</description>
      <pubDate>Thu, 10 Sep 2015 01:03:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hadoop-read-IO-size/m-p/31663#M7242</guid>
      <dc:creator>fil</dc:creator>
      <dc:date>2015-09-10T01:03:40Z</dc:date>
    </item>
    <item>
      <title>Re: Hadoop read IO size</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hadoop-read-IO-size/m-p/31669#M7243</link>
      <description>Start here, and drill further down into the DFSClient and DFSInputStream, etc. classes: &lt;A href="https://github.com/cloudera/hadoop-common/blob/cdh5.4.5-release/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L294-L303" target="_blank"&gt;https://github.com/cloudera/hadoop-common/blob/cdh5.4.5-release/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L294-L303&lt;/A&gt;</description>
      <pubDate>Thu, 10 Sep 2015 05:36:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hadoop-read-IO-size/m-p/31669#M7243</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2015-09-10T05:36:18Z</dc:date>
    </item>
  </channel>
</rss>

