Created on 07-15-2024 01:16 AM - edited 08-12-2024 01:13 AM
Prerequisite :
1. We should have the fsimage to continue with this analysis.
2.Access to git
Furthermore, HDFS is not geared up to efficiently accessing small files: it is primarily designed for streaming access of large files. Reading through small files normally causes lots of seeks and lots of hopping from datanode to datanode to retrieve each small file, all of which is an inefficient data access pattern.
1. Get the fsimage for your cluster you would like to do analysis for small files .
2. Process the image file in TSV format to do so we will be using "Delimited" processors of the Offline Image Viewer tool.
Go to under perl directory of cloned source at path - small_file_offenders/src/main/perl and run OIV tool with Delimited processor pointing your fsimage location. This will generate the TSV file which required for running this per script.
perl]# hadoop oiv -i /path/XXXXX/current/fsimage_0000000000000019171 -o fsimage-delimited.tsv -p Delimited
You will get the TSV file generated.
perl]# ls -lart
-rwxr-xr-x. 1 root root 2335 May 24 15:11 fsimage_users.pl
-rw-r--r--. 1 root root 531226 May 24 15:12 fsimage-delimited.tsv
3. Now you can invoke your perl script "fsimage_user.pl" pointing the tsv file .
perl]# ./fsimage_users.pl ./fsimage-delimited.tsv
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LC_CTYPE = "UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Limiting output to top 10 items per list. A small file is considered anything less than 134217728. Edit the script to adjust these values
Average File Size (bytes): 0; Users:
hue (total size: 0; number of files: 535)
hdfs (total size: 0; number of files: 13)
mapred (total size: 0; number of files: 4)
spark (total size: 0; number of files: 3)
impala (total size: 0; number of files: 2)
solr (total size: 0; number of files: 1)
systest (total size: 0; number of files: 1)
schemaregistry (total size: 0; number of files: 1)
admin (total size: 0; number of files: 1)
kafka (total size: 0; number of files: 1)
Average File Size (bytes): 2526.42660550459; Users:
hbase (total size: 550761; number of files: 218)
Average File Size (bytes): 28531.875; Users:
livy (total size: 456510; number of files: 16)
Average File Size (bytes): 2438570.8308329; Users:
oozie (total size: 4713757416; number of files: 1933)
Average File Size (bytes): 4568935.69565217; Users:
hive (total size: 735598647; number of files: 161)
Average File Size (bytes): 209409621.25; Users:
yarn (total size: 1675276970; number of files:
Average File Size (bytes): 469965475; Users:
tez (total size: 939930950; number of files: 2)
Users with most small files:
oozie: 1927 small files
hue: 535 small files
hbase: 218 small files
hive: 161 small files
livy: 16 small files
hdfs: 13 small files
yarn: 6 small files
mapred: 4 small files
spark: 3 small files
impala: 2 small files