Created 11-28-2018 02:47 AM
#1. Below hdfs location having duplicate directories:
======================================================
hdfs dfs -ls /rajesh/int_datalake/Retirement/
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_datalake/Retirement/RS_Access
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_datalake/Retirement/RS_Access
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_datalake/Retirement/RS_Raw
drwxrwxr-x+ - root supergroup 0 2018-11-27 01:35 /rajesh/int_datalake/Retirement/RS_Raw1
drwxrwxrwx+ - root supergroup 0 2018-11-27 01:39 /rajesh/int_datalake/Retirement/RS_Raw_bk
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_datalake/Retirement/RS_Repos
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_datalake/Retirement/RS_Repos
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_datalake/Retirement/RS_Stage
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_datalake/Retirement/RS_Stage
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_datalake/Retirement/RS_Work
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_datalake/Retirement/RS_Work
First we moved the file to bk name and tired to remove but not worked
[root@ ]#
[root@ ]# hdfs dfs -mv /rajesh/int_rajeshlake/Retirement/RS_Raw /rajesh/int_rajeshlake/Retirement/RS_Raw_bk
[root@ ]#
hdfs dfs -rm -r /rajesh/int_datalake/Retirement/RS_Raw
rm: `/rajesh/int_datalake/Retirement/RS_Raw': No such file or directory
hdfs dfs -mv /rajesh/int_datalake/Retirement/RS_Raw /rajesh/int_datalake/Retirement/RS_Raw123
mv: `/rajesh/int_datalake/Retirement/RS_Raw': No such file or directory
#2. Finally removed directory rmdir below command and recreated the directory:
================================================================================
hdfs dfs -rmdir /rajesh/int_rajeshlake/Retirement/RS_Wor*
hdfs dfs -ls /rajesh/int_rajeshlake/Retirement/
Found 9 items
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_rajeshlake/Retirement/RS_Access
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_rajeshlake/Retirement/RS_Access
drwxrwxrwx+ - root supergroup 0 2018-11-27 01:39 /rajesh/int_rajeshlake/Retirement/RS_Raw
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_rajeshlake/Retirement/RS_Raw
drwxrwxr-x+ - root supergroup 0 2018-11-27 01:35 /rajesh/int_rajeshlake/Retirement/RS_Raw1
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_rajeshlake/Retirement/RS_Repos
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_rajeshlake/Retirement/RS_Repos
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_rajeshlake/Retirement/RS_Stage
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_rajeshlake/Retirement/RS_Stage
#3. Tried mkdir duplicat manually, Command executed not error directory already exists. But not no duplication created:
=======================================================================================================================
hdfs dfs -ls /rajesh/data/Corporate
drwxrwxr-x - root supergroup 0 2018-11-27 04:23 /rajesh/data/Corporate/Corp_Access
drwxrwxr-x - root supergroup 0 2018-11-27 04:24 /rajesh/data/Corporate/Corp_Raw
drwxrwxr-x - root supergroup 0 2018-11-27 04:25 /rajesh/data/Corporate/Corp_Repos
drwxrwxr-x - root supergroup 0 2018-11-27 04:25 /rajesh/data/Corporate/Corp_Stage
drwxrwxr-x - root supergroup 0 2018-11-27 04:26 /rajesh/data/Corporate/Corp_Work
hdfs dfs -mkdir -p /rajesh/data/Corporate/Corp_Work
hdfs dfs -ls /rajesh/data/Corporate
drwxrwxr-x - root supergroup 0 2018-11-27 04:23 /rajesh/data/Corporate/Corp_Access
drwxrwxr-x - root supergroup 0 2018-11-27 04:24 /rajesh/data/Corporate/Corp_Raw
drwxrwxr-x - root supergroup 0 2018-11-27 04:25 /rajesh/data/Corporate/Corp_Repos
drwxrwxr-x - root supergroup 0 2018-11-27 04:25 /rajesh/data/Corporate/Corp_Stage
drwxrwxr-x - root supergroup 0 2018-11-27 04:26 /rajesh/data/Corporate/Corp_Work
Please confirm how the duplicate directories created?
Directores are in empty file so we remove and recreated the file. If any table or data available what steps have to follow?
Created 12-14-2018 09:17 AM
Hello @RajeshMadurai
I suspect this might be because of any non-printable characters present in the file names.
How the directories are created ? Is it by some script / code ?
Also, could you please route the output to a file
#hadoop fs -ls <path> >/tmp/hdfslist
then
#cat -e /tmp/hdfslist
you could see the end of each file name, also you can see any other chars if present other than $ sign
Created on 12-21-2018 05:32 AM - edited 12-21-2018 05:33 AM
Hello @RajeshMadurai
Thank you for posting your update here
Do you see any other Special characters (other than $) at the end of filenames in your command's output ?
#hadoop fs -ls <path> >/tmp/hdfslist
#cat -e /tmp/hdfslist
or
#cat -v /tmp/hdfslist
Also, you can refer the below communty thread
http://community.cloudera.com/t5/Storage-Random-Access-HDFS/Duplicate-Directories-in-HDFS/m-p/37319
Hope this helps
Created 12-14-2018 09:17 AM
Hello @RajeshMadurai
I suspect this might be because of any non-printable characters present in the file names.
How the directories are created ? Is it by some script / code ?
Also, could you please route the output to a file
#hadoop fs -ls <path> >/tmp/hdfslist
then
#cat -e /tmp/hdfslist
you could see the end of each file name, also you can see any other chars if present other than $ sign
Created 12-21-2018 05:13 AM
Created on 12-21-2018 05:32 AM - edited 12-21-2018 05:33 AM
Hello @RajeshMadurai
Thank you for posting your update here
Do you see any other Special characters (other than $) at the end of filenames in your command's output ?
#hadoop fs -ls <path> >/tmp/hdfslist
#cat -e /tmp/hdfslist
or
#cat -v /tmp/hdfslist
Also, you can refer the below communty thread
http://community.cloudera.com/t5/Storage-Random-Access-HDFS/Duplicate-Directories-in-HDFS/m-p/37319
Hope this helps