Reply
Explorer
Posts: 19
Registered: ‎04-23-2019

Is there a way toanalyze small files(Less than block size) periodically in hdfs? can we automate it?

Hi Team,

 

Is there a way to analyze small files and paths on HDFS? is there a way to know ,which user ID has most number of small files?

 

Thanks in advance.

Posts: 1,894
Kudos: 433
Solutions: 303
Registered: ‎07-31-2013

Re: Is there a way toanalyze small files(Less than block size) periodically in hdfs? can we automate

There are a few options,

You can grab the fsimage periodically with the 'hdfs dfsadmin -fetchImage' command and analyze its delimited or XML outputs via the 'hdfs oiv' tool. The metadata will carry file lengths and ownership information that can help you aggregate it into a report with your record processing software of choice.

Cloudera Enterprise Reports Manager carries summary reports of watched directories: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_dg_reports.html

Cloudera Enterprise Navigator carries HDFS analytics that help show how your HDFS is being used: https://www.cloudera.com/documentation/enterprise/latest/topics/navigator_dashboard.html#concept_cnv...

Cloudera Enterprise Workload eXperience Manager (WXM) includes a small files reporting feature: https://www.cloudera.com/documentation/wxm/latest/topics/wxm_file_size_reporting.html
Explorer
Posts: 19
Registered: ‎04-23-2019

Re: Is there a way toanalyze small files(Less than block size) periodically in hdfs? can we automate

Hi Harsh,

 

Thanks for your information and links. can i have more details on hdfs oiv tool? how to setup and configure to analyze small files? is there any cdh doucument on this?

 

Thanks.

Posts: 1,894
Kudos: 433
Solutions: 303
Registered: ‎07-31-2013

Re: Is there a way toanalyze small files(Less than block size) periodically in hdfs? can we automate

The OIV tool is documented at
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
and
includes some examples. Try its Delimiter related options on a copy of your
HDFS fsimage file and checkout the result.
Announcements