<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to find no of lines in all the files in a hadoop directory? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-no-of-lines-in-all-the-files-in-a-hadoop/m-p/294360#M37536</link>
    <description>&lt;P&gt;hdfs dfs -ls -R &amp;lt;directory&amp;gt; |grep part-r* |awk '{print $8}' |xargs hdfs dfs -cat | wc -l&lt;/P&gt;</description>
    <pubDate>Mon, 20 Apr 2020 16:47:45 GMT</pubDate>
    <dc:creator>Ram303</dc:creator>
    <dc:date>2020-04-20T16:47:45Z</dc:date>
    <item>
      <title>How to find no of lines in all the files in a hadoop directory?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-no-of-lines-in-all-the-files-in-a-hadoop/m-p/172550#M37528</link>
      <description />
      <pubDate>Thu, 11 Aug 2016 14:47:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-no-of-lines-in-all-the-files-in-a-hadoop/m-p/172550#M37528</guid>
      <dc:creator>balavignesh_nag</dc:creator>
      <dc:date>2016-08-11T14:47:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to find no of lines in all the files in a hadoop directory?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-no-of-lines-in-all-the-files-in-a-hadoop/m-p/172551#M37529</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/12437/balavigneshnagamuthuvenkatesan.html" nodeid="12437"&gt;@Bala Vignesh N V&lt;/A&gt;&lt;P&gt;You can use below command to check the number of lines in a HDFS file:&lt;/P&gt;&lt;P&gt;[hdfs@ssnode1 root]$ hdfs dfs -cat /tmp/test.txt |wc -l&lt;/P&gt;&lt;P&gt;23&lt;/P&gt;</description>
      <pubDate>Thu, 11 Aug 2016 14:53:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-no-of-lines-in-all-the-files-in-a-hadoop/m-p/172551#M37529</guid>
      <dc:creator>ssubhas</dc:creator>
      <dc:date>2016-08-11T14:53:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to find no of lines in all the files in a hadoop directory?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-no-of-lines-in-all-the-files-in-a-hadoop/m-p/172552#M37530</link>
      <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/users/5019/ssubhas.html"&gt;Sindhu&lt;/A&gt; I need to know the count for each file in a directory not for a single file in a directory.&lt;/P&gt;</description>
      <pubDate>Thu, 11 Aug 2016 14:56:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-no-of-lines-in-all-the-files-in-a-hadoop/m-p/172552#M37530</guid>
      <dc:creator>balavignesh_nag</dc:creator>
      <dc:date>2016-08-11T14:56:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to find no of lines in all the files in a hadoop directory?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-no-of-lines-in-all-the-files-in-a-hadoop/m-p/172553#M37531</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/12437/balavigneshnagamuthuvenkatesan.html" nodeid="12437"&gt;@Bala Vignesh N V&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You can try below command :&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;for i in `hdfs dfs -ls -R &amp;lt;DIRECTORY_PATH&amp;gt; | awk '{print $8}'`; do echo $i ; hdfs dfs -cat $i  | wc -l; done&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;It will recursively list the files in &amp;lt;DIRECTORY_PATH&amp;gt; and then print the number of lines in each file.&lt;/P&gt;</description>
      <pubDate>Thu, 11 Aug 2016 18:41:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-no-of-lines-in-all-the-files-in-a-hadoop/m-p/172553#M37531</guid>
      <dc:creator>ssharma</dc:creator>
      <dc:date>2016-08-11T18:41:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to find no of lines in all the files in a hadoop directory?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-no-of-lines-in-all-the-files-in-a-hadoop/m-p/172554#M37532</link>
      <description>&lt;P&gt;Thanks &lt;A href="https://community.hortonworks.com/users/11608/ssharma.html"&gt;ssharma&lt;/A&gt;. It helps. Is there any commands available to check the no of lines in each file in a directory or even just to find in a single file?&lt;/P&gt;</description>
      <pubDate>Thu, 11 Aug 2016 19:36:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-no-of-lines-in-all-the-files-in-a-hadoop/m-p/172554#M37532</guid>
      <dc:creator>balavignesh_nag</dc:creator>
      <dc:date>2016-08-11T19:36:33Z</dc:date>
    </item>
    <item>
      <title>Re: How to find no of lines in all the files in a hadoop directory?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-no-of-lines-in-all-the-files-in-a-hadoop/m-p/172555#M37533</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/12437/balavigneshnagamuthuvenkatesan.html" nodeid="12437"&gt;@Bala Vignesh N V&lt;/A&gt; &lt;/P&gt;&lt;P&gt;I dont think there's any single command to achieve this. Not only in HDFS but also in regular linux. So its better to use multiple commands with pipes or create a simple script which will provide you the desired output.&lt;/P&gt;&lt;P&gt;Please accept the answer if it was helpful &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 11 Aug 2016 22:58:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-no-of-lines-in-all-the-files-in-a-hadoop/m-p/172555#M37533</guid>
      <dc:creator>ssharma</dc:creator>
      <dc:date>2016-08-11T22:58:30Z</dc:date>
    </item>
    <item>
      <title>Re: How to find no of lines in all the files in a hadoop directory?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-no-of-lines-in-all-the-files-in-a-hadoop/m-p/172556#M37534</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/12437/balavigneshnagamuthuvenkatesan.html" nodeid="12437"&gt;@Bala Vignesh N V&lt;/A&gt;&lt;/P&gt;&lt;P&gt;The above approach is pretty good and work very well when you having small number of files but what if you have thousands or millions of files in directories? In that case its better to use Hadoop Mapreduce framework to do same job on large files but in less time. Below is an example to count lines using mapreduce.&lt;/P&gt;&lt;P&gt;&lt;A href="https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-count-number-of-lines-in-a-file-using-map-reduce-framework" target="_blank"&gt;https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-count-number-of-lines-in-a-file-using-map-reduce-framework&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 12 Aug 2016 00:49:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-no-of-lines-in-all-the-files-in-a-hadoop/m-p/172556#M37534</guid>
      <dc:creator>jyadav</dc:creator>
      <dc:date>2016-08-12T00:49:00Z</dc:date>
    </item>
    <item>
      <title>Re: How to find no of lines in all the files in a hadoop directory?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-no-of-lines-in-all-the-files-in-a-hadoop/m-p/172557#M37535</link>
      <description>&lt;P&gt;To get the sum of count all rows in a directory, you can follow the below.&lt;/P&gt;&lt;P&gt;a=0 &lt;/P&gt;&lt;P&gt;for i in `hdfs dfs -ls -R &amp;lt;DIRECTORY_PATH&amp;gt; | awk '{print $8}'`; &lt;/P&gt;&lt;P&gt;
do &lt;/P&gt;&lt;P&gt;
echo $i ; &lt;/P&gt;&lt;P&gt;
b="`hdfs dfs -cat $i | wc -l`"; &lt;/P&gt;&lt;P&gt;
a=`expr $a + $b` &lt;/P&gt;&lt;P&gt;echo $a; &lt;/P&gt;&lt;P&gt;done&lt;/P&gt;</description>
      <pubDate>Fri, 21 Sep 2018 21:53:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-no-of-lines-in-all-the-files-in-a-hadoop/m-p/172557#M37535</guid>
      <dc:creator>gvkmdkra</dc:creator>
      <dc:date>2018-09-21T21:53:31Z</dc:date>
    </item>
    <item>
      <title>Re: How to find no of lines in all the files in a hadoop directory?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-no-of-lines-in-all-the-files-in-a-hadoop/m-p/294360#M37536</link>
      <description>&lt;P&gt;hdfs dfs -ls -R &amp;lt;directory&amp;gt; |grep part-r* |awk '{print $8}' |xargs hdfs dfs -cat | wc -l&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2020 16:47:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-no-of-lines-in-all-the-files-in-a-hadoop/m-p/294360#M37536</guid>
      <dc:creator>Ram303</dc:creator>
      <dc:date>2020-04-20T16:47:45Z</dc:date>
    </item>
  </channel>
</rss>

