Support Questions

balavignesh_nag · ‎08-11-2016

ssharma · ‎08-11-2016

@Bala Vignesh N V

You can try below command :

for i in `hdfs dfs -ls -R <DIRECTORY_PATH> | awk '{print $8}'`; do echo $i ; hdfs dfs -cat $i | wc -l; done

It will recursively list the files in <DIRECTORY_PATH> and then print the number of lines in each file.

View solution in original post

ssubhas · ‎08-11-2016

@Bala Vignesh N V

You can use below command to check the number of lines in a HDFS file:

[hdfs@ssnode1 root]$ hdfs dfs -cat /tmp/test.txt |wc -l

23

balavignesh_nag · ‎08-11-2016

Sindhu I need to know the count for each file in a directory not for a single file in a directory.

ssharma · ‎08-11-2016

@Bala Vignesh N V

You can try below command :

for i in `hdfs dfs -ls -R <DIRECTORY_PATH> | awk '{print $8}'`; do echo $i ; hdfs dfs -cat $i | wc -l; done

It will recursively list the files in <DIRECTORY_PATH> and then print the number of lines in each file.

balavignesh_nag · ‎08-11-2016

Thanks ssharma. It helps. Is there any commands available to check the no of lines in each file in a directory or even just to find in a single file?

ssharma · ‎08-11-2016

@Bala Vignesh N V

I dont think there's any single command to achieve this. Not only in HDFS but also in regular linux. So its better to use multiple commands with pipes or create a simple script which will provide you the desired output.

Please accept the answer if it was helpful 🙂

jyadav · ‎08-11-2016

Hi @Bala Vignesh N V

The above approach is pretty good and work very well when you having small number of files but what if you have thousands or millions of files in directories? In that case its better to use Hadoop Mapreduce framework to do same job on large files but in less time. Below is an example to count lines using mapreduce.

https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-count-number-of-lines-in-a-file-using...

gvkmdkra · ‎09-21-2018

To get the sum of count all rows in a directory, you can follow the below.

a=0

for i in `hdfs dfs -ls -R <DIRECTORY_PATH> | awk '{print $8}'`;

do

echo $i ;

b="`hdfs dfs -cat $i | wc -l`";

a=`expr $a + $b`

echo $a;

done

Ram303 · ‎04-20-2020

hdfs dfs -ls -R <directory> |grep part-r* |awk '{print $8}' |xargs hdfs dfs -cat | wc -l

Cloudera Community

Support Questions

How to find no of lines in all the files in a hadoop directory?