<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Taking long time to copy files from hdfs in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Taking-long-time-to-copy-files-from-hdfs/m-p/137615#M19120</link>
    <description>&lt;A rel="user" href="https://community.cloudera.com/users/2302/arunpoy.html" nodeid="2302"&gt;@ARUNKUMAR RAMASAMY&lt;/A&gt;&lt;P&gt;copy command is slower than for example move or distcp. Zipping the 300 files into 1 larger file would make things better for you as Hadoop likes large individual files over many files/directories. You can use merge command, maybe compress and take a look at Hadoop Archive format, then try copying again.&lt;/P&gt;</description>
    <pubDate>Thu, 11 Feb 2016 19:14:51 GMT</pubDate>
    <dc:creator>aervits</dc:creator>
    <dc:date>2016-02-11T19:14:51Z</dc:date>
    <item>
      <title>Taking long time to copy files from hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Taking-long-time-to-copy-files-from-hdfs/m-p/137613#M19118</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am running a client in a different network and the hadoop cluster is in a different network.When i am trying to copy 60 MB of data(300 small files) from hdfs to the client machine, it is almost taking 20 minutes and do see a warning like "Input stream closed". is this because of a network between the client and the cluster or will there be anything that i need to look on.&lt;/P&gt;</description>
      <pubDate>Thu, 11 Feb 2016 19:08:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Taking-long-time-to-copy-files-from-hdfs/m-p/137613#M19118</guid>
      <dc:creator>arunpoy</dc:creator>
      <dc:date>2016-02-11T19:08:22Z</dc:date>
    </item>
    <item>
      <title>Re: Taking long time to copy files from hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Taking-long-time-to-copy-files-from-hdfs/m-p/137614#M19119</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2302/arunpoy.html" nodeid="2302"&gt;@ARUNKUMAR RAMASAMY&lt;/A&gt; &lt;/P&gt;&lt;P&gt;How are you copying the files? Time taken depends on factors like networks speed, system load and mechanism to download files. &lt;/P&gt;&lt;P&gt;Now, you are communicating over different vlans then it does add some overhead and other networking settings configured at the network , time outs etc&lt;/P&gt;</description>
      <pubDate>Thu, 11 Feb 2016 19:11:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Taking-long-time-to-copy-files-from-hdfs/m-p/137614#M19119</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2016-02-11T19:11:39Z</dc:date>
    </item>
    <item>
      <title>Re: Taking long time to copy files from hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Taking-long-time-to-copy-files-from-hdfs/m-p/137615#M19120</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/2302/arunpoy.html" nodeid="2302"&gt;@ARUNKUMAR RAMASAMY&lt;/A&gt;&lt;P&gt;copy command is slower than for example move or distcp. Zipping the 300 files into 1 larger file would make things better for you as Hadoop likes large individual files over many files/directories. You can use merge command, maybe compress and take a look at Hadoop Archive format, then try copying again.&lt;/P&gt;</description>
      <pubDate>Thu, 11 Feb 2016 19:14:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Taking-long-time-to-copy-files-from-hdfs/m-p/137615#M19120</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-11T19:14:51Z</dc:date>
    </item>
    <item>
      <title>Re: Taking long time to copy files from hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Taking-long-time-to-copy-files-from-hdfs/m-p/137616#M19121</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/140/nsabharwal.html" nodeid="140"&gt;@Neeraj Sabharwal&lt;/A&gt;, we are using just a plain hdfs get command.&lt;/P&gt;</description>
      <pubDate>Thu, 11 Feb 2016 19:16:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Taking-long-time-to-copy-files-from-hdfs/m-p/137616#M19121</guid>
      <dc:creator>arunpoy</dc:creator>
      <dc:date>2016-02-11T19:16:14Z</dc:date>
    </item>
    <item>
      <title>Re: Taking long time to copy files from hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Taking-long-time-to-copy-files-from-hdfs/m-p/137617#M19122</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2302/arunpoy.html" nodeid="2302"&gt;@ARUNKUMAR RAMASAMY&lt;/A&gt;  You can look into webhdfs protocol. &lt;A href="http://hortonworks.com/blog/webhdfs-%E2%80%93-http-rest-access-to-hdfs/"&gt;http://hortonworks.com/blog/webhdfs-%E2%80%93-http-rest-access-to-hdfs/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 11 Feb 2016 19:17:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Taking-long-time-to-copy-files-from-hdfs/m-p/137617#M19122</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2016-02-11T19:17:21Z</dc:date>
    </item>
    <item>
      <title>Re: Taking long time to copy files from hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Taking-long-time-to-copy-files-from-hdfs/m-p/137618#M19123</link>
      <description>&lt;P&gt;How do you copy the small files? Are you running one hadoop fs -put for every small file? ( for example in a shell script ). Then I would expect bad performance because the hadoop client is a java application and needs some setup time for each command.&lt;/P&gt;&lt;P&gt;If you run it in a single put command this would be very bad performance. I normally get 200-300GB/hour. So 60MB should be done in seconds. I would check network speed by doing a simple scp from your client to a node of the cluster. &lt;/P&gt;&lt;P&gt;Regarding small files: &lt;/P&gt;&lt;P&gt;- A put of small files is definitely slower than a put of one big file but it shouldn't be 20 minutes. I once benchmarked it and I think it was 2-3 times slower to write very small files.&lt;/P&gt;&lt;P&gt;- Why do you copy such tiny files into HDFS? This is bad for hadoop in general. Try to find a way to merge them. ( if its data files, if they are oozie definitions or so its obviously different.&lt;/P&gt;&lt;P&gt;The input stream closed is by itself not dangerous. Normal put commands can show it in many scenarios ( a minor bug added to hdfs and fixed now ). &lt;/P&gt;</description>
      <pubDate>Thu, 11 Feb 2016 19:17:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Taking-long-time-to-copy-files-from-hdfs/m-p/137618#M19123</guid>
      <dc:creator>bleonhardi</dc:creator>
      <dc:date>2016-02-11T19:17:33Z</dc:date>
    </item>
    <item>
      <title>Re: Taking long time to copy files from hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Taking-long-time-to-copy-files-from-hdfs/m-p/137619#M19124</link>
      <description>&lt;P&gt;we dont copy small files into hdfs. A MR job runs and creates small files based on the operation. Then these files are copied (using hdfs get) to the client machine and then uploaded into a MYSQL DB. This is a legacy process and i am just new to the stuff. Trying to find out the reasons.&lt;/P&gt;&lt;P&gt;Also &lt;A rel="user" href="https://community.cloudera.com/users/168/bleonhardi.html" nodeid="168"&gt;@Benjamin Leonhardi&lt;/A&gt; do you know the bug # for the HDFS ?&lt;/P&gt;</description>
      <pubDate>Thu, 11 Feb 2016 19:25:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Taking-long-time-to-copy-files-from-hdfs/m-p/137619#M19124</guid>
      <dc:creator>arunpoy</dc:creator>
      <dc:date>2016-02-11T19:25:51Z</dc:date>
    </item>
    <item>
      <title>Re: Taking long time to copy files from hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Taking-long-time-to-copy-files-from-hdfs/m-p/137620#M19125</link>
      <description>&lt;P&gt;if it's an MR program, you can write out fewer files, consider using smaller number of reducers and use compression. Specifics of which can be a separate question on this website. &lt;A rel="user" href="https://community.cloudera.com/users/2302/arunpoy.html" nodeid="2302"&gt;@ARUNKUMAR RAMASAMY&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 11 Feb 2016 19:38:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Taking-long-time-to-copy-files-from-hdfs/m-p/137620#M19125</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-11T19:38:15Z</dc:date>
    </item>
    <item>
      <title>Re: Taking long time to copy files from hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Taking-long-time-to-copy-files-from-hdfs/m-p/137621#M19126</link>
      <description>&lt;P&gt;Regarding the bug: ( with thanks to &lt;A rel="user" href="https://community.cloudera.com/users/140/nsabharwal.html" nodeid="140"&gt;@Neeraj Sabharwal&lt;/A&gt; )&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/14383/dfsinputstream-has-been-closed-already.html"&gt;https://community.hortonworks.com/questions/14383/dfsinputstream-has-been-closed-already.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;So the get is simply a single get on an hdfs folder? Then a slow network connection would be my only guess. &lt;/P&gt;</description>
      <pubDate>Thu, 11 Feb 2016 19:44:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Taking-long-time-to-copy-files-from-hdfs/m-p/137621#M19126</guid>
      <dc:creator>bleonhardi</dc:creator>
      <dc:date>2016-02-11T19:44:41Z</dc:date>
    </item>
  </channel>
</rss>

