- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
duplicate directories in hdfs location
- Labels:
-
Cloudera Manager
-
HDFS
Created 11-28-2018 02:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
#1. Below hdfs location having duplicate directories:
======================================================
hdfs dfs -ls /rajesh/int_datalake/Retirement/
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_datalake/Retirement/RS_Access
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_datalake/Retirement/RS_Access
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_datalake/Retirement/RS_Raw
drwxrwxr-x+ - root supergroup 0 2018-11-27 01:35 /rajesh/int_datalake/Retirement/RS_Raw1
drwxrwxrwx+ - root supergroup 0 2018-11-27 01:39 /rajesh/int_datalake/Retirement/RS_Raw_bk
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_datalake/Retirement/RS_Repos
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_datalake/Retirement/RS_Repos
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_datalake/Retirement/RS_Stage
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_datalake/Retirement/RS_Stage
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_datalake/Retirement/RS_Work
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_datalake/Retirement/RS_Work
First we moved the file to bk name and tired to remove but not worked
[root@ ]#
[root@ ]# hdfs dfs -mv /rajesh/int_rajeshlake/Retirement/RS_Raw /rajesh/int_rajeshlake/Retirement/RS_Raw_bk
[root@ ]#
hdfs dfs -rm -r /rajesh/int_datalake/Retirement/RS_Raw
rm: `/rajesh/int_datalake/Retirement/RS_Raw': No such file or directory
hdfs dfs -mv /rajesh/int_datalake/Retirement/RS_Raw /rajesh/int_datalake/Retirement/RS_Raw123
mv: `/rajesh/int_datalake/Retirement/RS_Raw': No such file or directory
#2. Finally removed directory rmdir below command and recreated the directory:
================================================================================
hdfs dfs -rmdir /rajesh/int_rajeshlake/Retirement/RS_Wor*
hdfs dfs -ls /rajesh/int_rajeshlake/Retirement/
Found 9 items
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_rajeshlake/Retirement/RS_Access
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_rajeshlake/Retirement/RS_Access
drwxrwxrwx+ - root supergroup 0 2018-11-27 01:39 /rajesh/int_rajeshlake/Retirement/RS_Raw
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_rajeshlake/Retirement/RS_Raw
drwxrwxr-x+ - root supergroup 0 2018-11-27 01:35 /rajesh/int_rajeshlake/Retirement/RS_Raw1
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_rajeshlake/Retirement/RS_Repos
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_rajeshlake/Retirement/RS_Repos
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:24 /rajesh/int_rajeshlake/Retirement/RS_Stage
drwxrwxr-x+ - root supergroup 0 2018-10-29 03:17 /rajesh/int_rajeshlake/Retirement/RS_Stage
#3. Tried mkdir duplicat manually, Command executed not error directory already exists. But not no duplication created:
=======================================================================================================================
hdfs dfs -ls /rajesh/data/Corporate
drwxrwxr-x - root supergroup 0 2018-11-27 04:23 /rajesh/data/Corporate/Corp_Access
drwxrwxr-x - root supergroup 0 2018-11-27 04:24 /rajesh/data/Corporate/Corp_Raw
drwxrwxr-x - root supergroup 0 2018-11-27 04:25 /rajesh/data/Corporate/Corp_Repos
drwxrwxr-x - root supergroup 0 2018-11-27 04:25 /rajesh/data/Corporate/Corp_Stage
drwxrwxr-x - root supergroup 0 2018-11-27 04:26 /rajesh/data/Corporate/Corp_Work
hdfs dfs -mkdir -p /rajesh/data/Corporate/Corp_Work
hdfs dfs -ls /rajesh/data/Corporate
drwxrwxr-x - root supergroup 0 2018-11-27 04:23 /rajesh/data/Corporate/Corp_Access
drwxrwxr-x - root supergroup 0 2018-11-27 04:24 /rajesh/data/Corporate/Corp_Raw
drwxrwxr-x - root supergroup 0 2018-11-27 04:25 /rajesh/data/Corporate/Corp_Repos
drwxrwxr-x - root supergroup 0 2018-11-27 04:25 /rajesh/data/Corporate/Corp_Stage
drwxrwxr-x - root supergroup 0 2018-11-27 04:26 /rajesh/data/Corporate/Corp_Work
Please confirm how the duplicate directories created?
Directores are in empty file so we remove and recreated the file. If any table or data available what steps have to follow?
Created 12-14-2018 09:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @RajeshMadurai
I suspect this might be because of any non-printable characters present in the file names.
How the directories are created ? Is it by some script / code ?
Also, could you please route the output to a file
#hadoop fs -ls <path> >/tmp/hdfslist
then
#cat -e /tmp/hdfslist
you could see the end of each file name, also you can see any other chars if present other than $ sign
Satz
Created on 12-21-2018 05:32 AM - edited 12-21-2018 05:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @RajeshMadurai
Thank you for posting your update here
Do you see any other Special characters (other than $) at the end of filenames in your command's output ?
#hadoop fs -ls <path> >/tmp/hdfslist
#cat -e /tmp/hdfslist
or
#cat -v /tmp/hdfslist
Also, you can refer the below communty thread
http://community.cloudera.com/t5/Storage-Random-Access-HDFS/Duplicate-Directories-in-HDFS/m-p/37319
Hope this helps
Satz
Created 12-14-2018 09:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @RajeshMadurai
I suspect this might be because of any non-printable characters present in the file names.
How the directories are created ? Is it by some script / code ?
Also, could you please route the output to a file
#hadoop fs -ls <path> >/tmp/hdfslist
then
#cat -e /tmp/hdfslist
you could see the end of each file name, also you can see any other chars if present other than $ sign
Satz
Created 12-21-2018 05:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes through the shell script executed two mins(1st time given hive prompt wrongly). But any second time it should exists file already available.
#cat -e /tmp/hdfslist -you could see the end of each file name, also you can see any other chars if present other than $ sign
After taking the output from above command. Have to remove the duplicate files hdfs dfs -rm rajesh/int_rajeshlake/Retirement/RS_Raw$
Please confirm
Created on 12-21-2018 05:32 AM - edited 12-21-2018 05:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @RajeshMadurai
Thank you for posting your update here
Do you see any other Special characters (other than $) at the end of filenames in your command's output ?
#hadoop fs -ls <path> >/tmp/hdfslist
#cat -e /tmp/hdfslist
or
#cat -v /tmp/hdfslist
Also, you can refer the below communty thread
http://community.cloudera.com/t5/Storage-Random-Access-HDFS/Duplicate-Directories-in-HDFS/m-p/37319
Hope this helps
Satz
