Support Questions

Find answers, ask questions, and share your expertise

duplicate directories in hdfs location

avatar

#1. Below hdfs location having duplicate directories:
======================================================

hdfs dfs -ls /rajesh/int_datalake/Retirement/

drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_datalake/Retirement/RS_Access
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_datalake/Retirement/RS_Access
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_datalake/Retirement/RS_Raw
drwxrwxr-x+ - root supergroup 0 2018-11-27 01:35 /rajesh/int_datalake/Retirement/RS_Raw1
drwxrwxrwx+ - root supergroup 0 2018-11-27 01:39 /rajesh/int_datalake/Retirement/RS_Raw_bk
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_datalake/Retirement/RS_Repos
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_datalake/Retirement/RS_Repos
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_datalake/Retirement/RS_Stage
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_datalake/Retirement/RS_Stage
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_datalake/Retirement/RS_Work
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_datalake/Retirement/RS_Work

First we moved the file to bk name and tired to remove but not worked
[root@ ]#
[root@ ]# hdfs dfs -mv /rajesh/int_rajeshlake/Retirement/RS_Raw /rajesh/int_rajeshlake/Retirement/RS_Raw_bk
[root@ ]#


hdfs dfs -rm -r /rajesh/int_datalake/Retirement/RS_Raw
rm: `/rajesh/int_datalake/Retirement/RS_Raw': No such file or directory

hdfs dfs -mv /rajesh/int_datalake/Retirement/RS_Raw /rajesh/int_datalake/Retirement/RS_Raw123
mv: `/rajesh/int_datalake/Retirement/RS_Raw': No such file or directory

#2. Finally removed directory rmdir below command and recreated the directory:
================================================================================
hdfs dfs -rmdir /rajesh/int_rajeshlake/Retirement/RS_Wor*
hdfs dfs -ls /rajesh/int_rajeshlake/Retirement/
Found 9 items
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_rajeshlake/Retirement/RS_Access
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_rajeshlake/Retirement/RS_Access
drwxrwxrwx+ - root supergroup 0 2018-11-27 01:39 /rajesh/int_rajeshlake/Retirement/RS_Raw
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_rajeshlake/Retirement/RS_Raw
drwxrwxr-x+ - root supergroup 0 2018-11-27 01:35 /rajesh/int_rajeshlake/Retirement/RS_Raw1
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_rajeshlake/Retirement/RS_Repos
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_rajeshlake/Retirement/RS_Repos
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_rajeshlake/Retirement/RS_Stage
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_rajeshlake/Retirement/RS_Stage


#3. Tried mkdir duplicat manually, Command executed not error directory already exists. But not no duplication created:
=======================================================================================================================
hdfs dfs -ls /rajesh/data/Corporate
drwxrwxr-x - root supergroup 0 2018-11-27 04:23 /rajesh/data/Corporate/Corp_Access
drwxrwxr-x - root supergroup 0 2018-11-27 04:24 /rajesh/data/Corporate/Corp_Raw
drwxrwxr-x - root supergroup 0 2018-11-27 04:25 /rajesh/data/Corporate/Corp_Repos
drwxrwxr-x - root supergroup 0 2018-11-27 04:25 /rajesh/data/Corporate/Corp_Stage
drwxrwxr-x - root supergroup 0 2018-11-27 04:26 /rajesh/data/Corporate/Corp_Work

hdfs dfs -mkdir -p /rajesh/data/Corporate/Corp_Work
hdfs dfs -ls /rajesh/data/Corporate
drwxrwxr-x - root supergroup 0 2018-11-27 04:23 /rajesh/data/Corporate/Corp_Access
drwxrwxr-x - root supergroup 0 2018-11-27 04:24 /rajesh/data/Corporate/Corp_Raw
drwxrwxr-x - root supergroup 0 2018-11-27 04:25 /rajesh/data/Corporate/Corp_Repos
drwxrwxr-x - root supergroup 0 2018-11-27 04:25 /rajesh/data/Corporate/Corp_Stage
drwxrwxr-x - root supergroup 0 2018-11-27 04:26 /rajesh/data/Corporate/Corp_Work

Please confirm how the duplicate directories created?
Directores are in empty file so we remove and recreated the file. If any table or data available what steps have to follow?

2 ACCEPTED SOLUTIONS

avatar
Expert Contributor

Hello @RajeshMadurai

 

I suspect this might be because of any non-printable characters present in the file names.

 

How the directories are created ? Is it by some script / code ?

 

Also, could you please route the output to a file

 

#hadoop fs -ls <path> >/tmp/hdfslist

 

then

 

#cat -e /tmp/hdfslist

 

you could see the end of each file name, also you can see any other chars if present other than $ sign

Thanks,
Satz

View solution in original post

avatar
Expert Contributor

Hello @RajeshMadurai

 

Thank you for posting your update here

 

Do you see any other Special characters (other than $) at the end of filenames in your command's output ?

 

#hadoop fs -ls <path> >/tmp/hdfslist

#cat -e /tmp/hdfslist 

 

or 

 

#cat -v /tmp/hdfslist 

 

Also, you can refer the below communty thread

 

http://community.cloudera.com/t5/Storage-Random-Access-HDFS/Duplicate-Directories-in-HDFS/m-p/37319

 

Hope this helps

Thanks,
Satz

View solution in original post

3 REPLIES 3

avatar
Expert Contributor

Hello @RajeshMadurai

 

I suspect this might be because of any non-printable characters present in the file names.

 

How the directories are created ? Is it by some script / code ?

 

Also, could you please route the output to a file

 

#hadoop fs -ls <path> >/tmp/hdfslist

 

then

 

#cat -e /tmp/hdfslist

 

you could see the end of each file name, also you can see any other chars if present other than $ sign

Thanks,
Satz

avatar
How the directories are created ? Is it by some script / code ?
Yes through the shell script executed two mins(1st time given hive prompt wrongly). But any second time it should exists file already available.
#cat -e /tmp/hdfslist -you could see the end of each file name, also you can see any other chars if present other than $ sign
After taking the output from above command. Have to remove the duplicate files hdfs dfs -rm rajesh/int_rajeshlake/Retirement/RS_Raw$
Please confirm

avatar
Expert Contributor

Hello @RajeshMadurai

 

Thank you for posting your update here

 

Do you see any other Special characters (other than $) at the end of filenames in your command's output ?

 

#hadoop fs -ls <path> >/tmp/hdfslist

#cat -e /tmp/hdfslist 

 

or 

 

#cat -v /tmp/hdfslist 

 

Also, you can refer the below communty thread

 

http://community.cloudera.com/t5/Storage-Random-Access-HDFS/Duplicate-Directories-in-HDFS/m-p/37319

 

Hope this helps

Thanks,
Satz