<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: parsing the HDFS dfs -count output in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/parsing-the-HDFS-dfs-count-output/m-p/53887#M59607</link>
    <description>&lt;P&gt;man took a bit of trial and error.&lt;/P&gt;&lt;P&gt;The issue with the first run is that it is returning an empty line. I tried a few awk specific was to get around it but they didn't work. So here is a hack. And using the variable withing awk as well.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;DC=PN
hdfs dfs -ls /lib/ | grep "drwx" | awk '{system("hdfs dfs -count " $8) }' | awk '{ gsub(/\/lib\//,"'$DC'"".hadoop.hdfs.",$4); print $4 ".folderscount",$1"\n"$4 ".filescount",$2"\n"$4 ".size",$3;}'

PN.hadoop.hdfs.archive.folderscount 9
PN.hadoop.hdfs.archive.filescount 103
PN.hadoop.hdfs.archive.size 928524788
PN.hadoop.hdfs.dae.folderscount 1
PN.hadoop.hdfs.dae.filescount 13
PN.hadoop.hdfs.dae.size 192504874
PN.hadoop.hdfs.schema.folderscount 1
PN.hadoop.hdfs.schema.filescount 14
PN.hadoop.hdfs.schema.size 45964

DC=VA

hdfs dfs -ls /lib/ | grep "drwx" | awk '{system("hdfs dfs -count " $8) }' | awk '{ gsub(/\/lib\//,"'$DC'"".hadoop.hdfs.",$4); print $4 ".folderscount",$1"\n"$4 ".filescount",$2"\n"$4 ".size",$3;}'

VA.hadoop.hdfs.archive.folderscount 9
VA.hadoop.hdfs.archive.filescount 103
VA.hadoop.hdfs.archive.size 928524788
VA.hadoop.hdfs.dae.folderscount 1
VA.hadoop.hdfs.dae.filescount 13
VA.hadoop.hdfs.dae.size 192504874
VA.hadoop.hdfs.schema.folderscount 1
VA.hadoop.hdfs.schema.filescount 14
VA.hadoop.hdfs.schema.size 45964&lt;/PRE&gt;</description>
    <pubDate>Wed, 19 Apr 2017 21:39:41 GMT</pubDate>
    <dc:creator>mbigelow</dc:creator>
    <dc:date>2017-04-19T21:39:41Z</dc:date>
    <item>
      <title>parsing the HDFS dfs -count output</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/parsing-the-HDFS-dfs-count-output/m-p/53806#M59604</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I need to send the hdfs dfs -count output to graphite, but want to do this on one command rather to do 3 commands: one for the folders count, the files count and the size,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I can do this by separated commands like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;hdfs dfs -ls /fawze/data | awk '{system("hdfs dfs -count " $8) }' | awk '{print $4,$2;}'&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But i want the output to be like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;fawze/data/x.folders &amp;nbsp; &amp;nbsp;20&lt;/P&gt;&lt;P&gt;fawze/data/x.files &amp;nbsp; &amp;nbsp;200&amp;nbsp;&lt;/P&gt;&lt;P&gt;fawze/data/x.size &amp;nbsp; 2650&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;fawze/data/y.folders &amp;nbsp; &amp;nbsp;25&amp;nbsp;&lt;/P&gt;&lt;P&gt;fawze/data/y.files &amp;nbsp; &amp;nbsp;2450&lt;/P&gt;&lt;P&gt;fawze/data/y.size &amp;nbsp; 23560&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm not a linux expert so will appreciate any help.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 18 Apr 2017 13:46:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/parsing-the-HDFS-dfs-count-output/m-p/53806#M59604</guid>
      <dc:creator>Fawze</dc:creator>
      <dc:date>2017-04-18T13:46:58Z</dc:date>
    </item>
    <item>
      <title>Re: parsing the HDFS dfs -count output</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/parsing-the-HDFS-dfs-count-output/m-p/53815#M59605</link>
      <description>You were close.&lt;BR /&gt;&lt;BR /&gt;hdfs dfs -ls /lib | awk '{system("hdfs dfs -count " $8) }' | awk '{print $4,$1"\n"$4,$2"\n"$4,$3;}'&lt;BR /&gt;&lt;BR /&gt;This throws an usage error for the first run and I haven't looked into why, but prints out all subdirs; three entries for each stat from -count.</description>
      <pubDate>Tue, 18 Apr 2017 16:11:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/parsing-the-HDFS-dfs-count-output/m-p/53815#M59605</guid>
      <dc:creator>mbigelow</dc:creator>
      <dc:date>2017-04-18T16:11:07Z</dc:date>
    </item>
    <item>
      <title>Re: parsing the HDFS dfs -count output</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/parsing-the-HDFS-dfs-count-output/m-p/53883#M59606</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/18127"&gt;@mbigelow&lt;/a&gt;&amp;nbsp;That's helped me alot, Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I made small additions to the command:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;hdfs dfs -ls /liveperson/data | grep -v storage | awk '{system("hdfs dfs -count " $8) }' | awk '{ gsub(/\/liveperson\/data\/&lt;/SPAN&gt;&lt;SPAN&gt;server_/,"hadoop.hdfs.",$4); print $4 ".folderscount",$1"\n"$4 ".filescount",$2"\n"$4 ".size",$3;}'&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;still investigating the&amp;nbsp;usage error for the first run, and want to add a variable before the&amp;nbsp;hadoop.hdfs.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Can you help with this.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I have a vaibale called DC, and i want to concat it to the path and should looks like this (exampe DC is VA)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;VA.hadoop.hdfs.$4.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I identified $DC&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Apr 2017 20:56:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/parsing-the-HDFS-dfs-count-output/m-p/53883#M59606</guid>
      <dc:creator>Fawze</dc:creator>
      <dc:date>2017-04-19T20:56:54Z</dc:date>
    </item>
    <item>
      <title>Re: parsing the HDFS dfs -count output</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/parsing-the-HDFS-dfs-count-output/m-p/53887#M59607</link>
      <description>&lt;P&gt;man took a bit of trial and error.&lt;/P&gt;&lt;P&gt;The issue with the first run is that it is returning an empty line. I tried a few awk specific was to get around it but they didn't work. So here is a hack. And using the variable withing awk as well.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;DC=PN
hdfs dfs -ls /lib/ | grep "drwx" | awk '{system("hdfs dfs -count " $8) }' | awk '{ gsub(/\/lib\//,"'$DC'"".hadoop.hdfs.",$4); print $4 ".folderscount",$1"\n"$4 ".filescount",$2"\n"$4 ".size",$3;}'

PN.hadoop.hdfs.archive.folderscount 9
PN.hadoop.hdfs.archive.filescount 103
PN.hadoop.hdfs.archive.size 928524788
PN.hadoop.hdfs.dae.folderscount 1
PN.hadoop.hdfs.dae.filescount 13
PN.hadoop.hdfs.dae.size 192504874
PN.hadoop.hdfs.schema.folderscount 1
PN.hadoop.hdfs.schema.filescount 14
PN.hadoop.hdfs.schema.size 45964

DC=VA

hdfs dfs -ls /lib/ | grep "drwx" | awk '{system("hdfs dfs -count " $8) }' | awk '{ gsub(/\/lib\//,"'$DC'"".hadoop.hdfs.",$4); print $4 ".folderscount",$1"\n"$4 ".filescount",$2"\n"$4 ".size",$3;}'

VA.hadoop.hdfs.archive.folderscount 9
VA.hadoop.hdfs.archive.filescount 103
VA.hadoop.hdfs.archive.size 928524788
VA.hadoop.hdfs.dae.folderscount 1
VA.hadoop.hdfs.dae.filescount 13
VA.hadoop.hdfs.dae.size 192504874
VA.hadoop.hdfs.schema.folderscount 1
VA.hadoop.hdfs.schema.filescount 14
VA.hadoop.hdfs.schema.size 45964&lt;/PRE&gt;</description>
      <pubDate>Wed, 19 Apr 2017 21:39:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/parsing-the-HDFS-dfs-count-output/m-p/53887#M59607</guid>
      <dc:creator>mbigelow</dc:creator>
      <dc:date>2017-04-19T21:39:41Z</dc:date>
    </item>
  </channel>
</rss>

