Created on 09-15-2021 05:05 AM - last edited on 09-15-2021 10:31 PM by VidyaSargur
Hello,
For an application, I need to extract the maximum depth from an hdfs directory. I know how to do this action in shell: we can execute
find /tmp -type d -printf '%d\n' | sort -rn | head -1
So I wanted to do the same with the find function of hdfs:
hdfs dfs -find /tmp -type d
but the -type argument does not exist on hdfs, here is the error:
find: Unexpected argument: -type
Does anyone have any solution or advice for this problem ?
ps: my hadoop version Hadoop 2.6.0-cdh5.13.
regards,
thanks in advance
Created 09-15-2021 08:23 AM
Hi @Ellyly ,
Here is the example.
(1). Firstly, list -R and grep "^d" to show all the subdirectories in your path:
# sudo -u hdfs hdfs dfs -ls -R /folder1/ | grep "^d"
drwxr-xr-x - hdfs supergroup 0 2021-09-15 14:48 /folder1/folder2
drwxr-xr-x - hdfs supergroup 0 2021-09-15 15:01 /folder1/folder2/folder3
drwxr-xr-x - hdfs supergroup 0 2021-09-15 15:01 /folder1/folder2/folder3/folder4
drwxr-xr-x - hdfs supergroup 0 2021-09-11 05:09 /folder1/subfolder1
(2). Then, awk -F\/ '{print NF-1}' to calculate each directory's depth, actually we print number of fields separated by /.
After -F it is \ and /, no space in between, it is not character"V" !!! 🙂
# sudo -u hdfs hdfs dfs -ls -R /folder1/ | grep "^d" | awk -F\/ '{print NF-1}'
2
3
4
2
(3). Finally, sort and head
# sudo -u hdfs hdfs dfs -ls -R /folder1/ | grep "^d" | awk -F\/ '{print NF-1}'|sort -rn|head -1
4
Regards,
Will
If the answer helps, please accept as solution and click thumbs up.
Created 09-15-2021 08:23 AM
Hi @Ellyly ,
Here is the example.
(1). Firstly, list -R and grep "^d" to show all the subdirectories in your path:
# sudo -u hdfs hdfs dfs -ls -R /folder1/ | grep "^d"
drwxr-xr-x - hdfs supergroup 0 2021-09-15 14:48 /folder1/folder2
drwxr-xr-x - hdfs supergroup 0 2021-09-15 15:01 /folder1/folder2/folder3
drwxr-xr-x - hdfs supergroup 0 2021-09-15 15:01 /folder1/folder2/folder3/folder4
drwxr-xr-x - hdfs supergroup 0 2021-09-11 05:09 /folder1/subfolder1
(2). Then, awk -F\/ '{print NF-1}' to calculate each directory's depth, actually we print number of fields separated by /.
After -F it is \ and /, no space in between, it is not character"V" !!! 🙂
# sudo -u hdfs hdfs dfs -ls -R /folder1/ | grep "^d" | awk -F\/ '{print NF-1}'
2
3
4
2
(3). Finally, sort and head
# sudo -u hdfs hdfs dfs -ls -R /folder1/ | grep "^d" | awk -F\/ '{print NF-1}'|sort -rn|head -1
4
Regards,
Will
If the answer helps, please accept as solution and click thumbs up.
Created 09-17-2021 01:22 AM
Thanks @willx, this solved my problem, work perfectly !!!