Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Recursively find record count for files in S3

Recursively find record count for files in S3

New Contributor

Hi Gurus,

 

We have multiple directories and files in an S3 bucket. We would like to list the files and their corresponding record counts.

 

I am able to get the file name and size using the below command:

hdfs dfs -ls -R /bucket_name/Directory/* | awk '{system("hdfs dfs -count " $8) }' | awk '{print $4,$3;}'

 

The output format is:

/bucket_name/Directory/File_name.txt 44998 -- size

 

Is there a way we can get the filename and record count in a similar format?

 

Thanks in advance!

 

Regards,

Surendran

1 REPLY 1
Highlighted

Re: Recursively find record count for files in S3

Master Guru