Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Is there a way toanalyze small files(Less than block size) periodically in hdfs? can we automate it?

Is there a way toanalyze small files(Less than block size) periodically in hdfs? can we automate it?

Explorer

Hi Team,

 

Is there a way to analyze small files and paths on HDFS? is there a way to know ,which user ID has most number of small files?

 

Thanks in advance.

3 REPLIES 3

Re: Is there a way toanalyze small files(Less than block size) periodically in hdfs? can we automate

Master Guru
There are a few options,

You can grab the fsimage periodically with the 'hdfs dfsadmin -fetchImage' command and analyze its delimited or XML outputs via the 'hdfs oiv' tool. The metadata will carry file lengths and ownership information that can help you aggregate it into a report with your record processing software of choice.

Cloudera Enterprise Reports Manager carries summary reports of watched directories: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_dg_reports.html

Cloudera Enterprise Navigator carries HDFS analytics that help show how your HDFS is being used: https://www.cloudera.com/documentation/enterprise/latest/topics/navigator_dashboard.html#concept_cnv...

Cloudera Enterprise Workload eXperience Manager (WXM) includes a small files reporting feature: https://www.cloudera.com/documentation/wxm/latest/topics/wxm_file_size_reporting.html

Re: Is there a way toanalyze small files(Less than block size) periodically in hdfs? can we automate

Explorer

Hi Harsh,

 

Thanks for your information and links. can i have more details on hdfs oiv tool? how to setup and configure to analyze small files? is there any cdh doucument on this?

 

Thanks.

Re: Is there a way toanalyze small files(Less than block size) periodically in hdfs? can we automate

Master Guru
The OIV tool is documented at
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
and
includes some examples. Try its Delimiter related options on a copy of your
HDFS fsimage file and checkout the result.