Support Questions

Find answers, ask questions, and share your expertise

Duplicate Directories in HDFS

avatar
Explorer

Hi All,

 

Our application team created hdfs directories with below script.


hadoop fs -mkdir /a/b/c/d/20160208
hadoop fs -mkdir /a/b/c/d/20160208/s
hadoop fs -mkdir /a/b/c/d/20160208/s/inputmap
hadoop fs -mkdir /a/b/c/d/20160208/s/temp
hadoop fs -mkdir /a/b/c/d/20160208/s/map
hadoop fs -mkdir /a/b/c/d/20160208/s/input
hadoop fs -copyFromLocal /x/y/z/20160208.dat /a/b/c/d/20160208/s/inputmap

echo "Setup Complete"

 

The directories got creted but it throws error if we try to access it.

 

hdfs@hostname$ hadoop fs -ls /a/b/c/d/
Found 20 items
drwxr-xr-x - user group 0 2016-01-27 09:10 /a/b/c/d/20141211
drwxr-xr-x - user group 0 2016-01-06 01:03 /a/b/c/d/20141212
drwxr-xr-x - user group 0 2016-01-06 01:09 /a/b/c/d/20141213
drwxr-xr-x - user group 0 2015-11-12 08:53 /a/b/c/d/20151106
drwxr-xr-x - user group 0 2016-01-12 01:48 /a/b/c/d/20151118
drwxr-xr-x - user group 0 2015-12-04 04:21 /a/b/c/d/20151130
drwxrwxr-x - user group 0 2016-01-12 10:48 /a/b/c/d/20151221
drwxr-xr-x - user group 0 2016-01-19 11:23 /a/b/c/d/20160111
drwxr-xr-x - user group 0 2016-01-27 14:56 /a/b/c/d/20160112
drwxr-xr-x - user group 0 2016-02-02 16:12 /a/b/c/d/20160125
drwxr-xr-x - user group 0 2016-02-08 12:41 /a/b/c/d/20160126
drwxr-xr-x - user group 0 2016-02-08 10:26 /a/b/c/d/20160127
drwxr-xr-x - user group 0 2016-01-29 10:48 /a/b/c/d/20160129
drwxr-xr-x - user group 0 2016-02-09 02:43 /a/b/c/d/20160203
drwxr-xr-x - user group 0 2016-02-09 02:42 /a/b/c/d/20160204
drwxr-xr-x - user group 0 2016-02-08 15:38 /a/b/c/d/20160205
drwxr-xr-x - user group 0 2016-02-08 09:02 /a/b/c/d/20160205
drwxr-xr-x - user group 0 2016-02-08 07:00 /a/b/c/d/20160206
drwxr-xr-x - user group 0 2016-02-09 17:11 /a/b/c/d/20160208
drwxr-xr-x - user group 0 2016-02-08 11:07 /a/b/c/d/20160208
hdfs@hostname$ hadoop fs -ls /a/b/c/d/20160206
ls: `/a/b/c/d/20160206': No such file or directory

 

when we did | cat -v along with "ls" we came to know that some special character got inserted in the directory name as below.

 


hdfs@hostname$ hadoop fs -ls /a/b/c/d/ | cat -v
Found 20 items
drwxr-xr-x - user group 0 2016-01-27 09:10 /a/b/c/d//20141211
drwxr-xr-x - user group 0 2016-01-06 01:03 /a/b/c/d//20141212
drwxr-xr-x - user group 0 2016-01-06 01:09 /a/b/c/d//20141213
drwxr-xr-x - user group 0 2015-11-12 08:53 /a/b/c/d//20151106
drwxr-xr-x - user group 0 2016-01-12 01:48 /a/b/c/d//20151118
drwxr-xr-x - user group 0 2015-12-04 04:21 /a/b/c/d//20151130
drwxrwxr-x - user group 0 2016-01-12 10:48 /a/b/c/d//20151221
drwxr-xr-x - user group 0 2016-01-19 11:23 /a/b/c/d//20160111
drwxr-xr-x - user group 0 2016-01-27 14:56 /a/b/c/d//20160112
drwxr-xr-x - user group 0 2016-02-02 16:12 /a/b/c/d//20160125
drwxr-xr-x - user group 0 2016-02-08 12:41 /a/b/c/d//20160126
drwxr-xr-x - user group 0 2016-02-08 10:26 /a/b/c/d//20160127
drwxr-xr-x - user group 0 2016-01-29 10:48 /a/b/c/d//20160129
drwxr-xr-x - user group 0 2016-02-09 02:43 /a/b/c/d//20160203
drwxr-xr-x - user group 0 2016-02-09 02:42 /a/b/c/d//20160204
drwxr-xr-x - user group 0 2016-02-08 15:38 /a/b/c/d//20160205
drwxr-xr-x - user group 0 2016-02-08 09:02 /a/b/c/d//20160205^M
drwxr-xr-x - user group 0 2016-02-08 07:00 /a/b/c/d//20160206^M
drwxr-xr-x - user group 0 2016-02-09 17:11 /a/b/c/d//20160208
drwxr-xr-x - user group 0 2016-02-08 11:07 /a/b/c/d//20160208^M

 

Now i want to delete these duplicate entries, can anyone help me with this.

 

Thanks

Srini

 

 

 

1 ACCEPTED SOLUTION

avatar
Mentor
I prefer using the simpler bash syntax of using special escaped characters,
if it helps:

We know that ^M is the same as \r, which makes sense if you used Windows
Notepad to write the commands but forgot to convert the file via dos2unix:

~> echo $'\x0d' | cat -v
^M
~> echo -n $'\x0d' | od -c
0000000 \r
0000002

(The \x0D or \x0d is the hex equivalent of \r, per
http://www.asciitable.com/ (carriage return))

Therefore, you can use the $'' syntax to write a string that includes the
escape:

~> hadoop fs -ls $'/a/b/c/d/20160206\r'
Or,
~> hadoop fs -ls $'/a/b/c/d/20160206\x0d'

This words well regardless of the terminal emulator you are using, cause
we're escaping based on representation vs. by reliance on the emulator
understanding the characters via input.

View solution in original post

6 REPLIES 6

avatar
Community Manager

You would handle this in the same way if the issue occurred on a linux filesystem.   Use quotes around the filename and ctrl-v to insert the special characters.

 

In this case, I type ctrl-v then ctrl-m to insert ^M into my strings.

 

 

$ hdfs dfs -put /etc/group "/tmp/abc^M"

$ hdfs dfs -ls /tmp
Found 4 items
drwxrwxrwx   - hdfs   supergroup          0 2016-02-11 11:29 /tmp/.cloudera_health_monitoring_canary_files
-rw-r--r--   3 hdfs   supergroup        954 2016-02-11 11:30 /tmp/abc
drwx-wx-wx   - hive   supergroup          0 2016-01-11 12:10 /tmp/hive
drwxrwxrwt   - mapred hadoop              0 2016-01-11 12:08 /tmp/logs

$ hdfs dfs -ls /tmp | cat -v
Found 4 items
drwxrwxrwx   - hdfs   supergroup          0 2016-02-11 11:30 /tmp/.cloudera_health_monitoring_canary_files
-rw-r--r--   3 hdfs   supergroup        954 2016-02-11 11:30 /tmp/abc^M
drwx-wx-wx   - hive   supergroup          0 2016-01-11 12:10 /tmp/hive
drwxrwxrwt   - mapred hadoop              0 2016-01-11 12:08 /tmp/logs

$ hdfs dfs -mv "/tmp/abc^M" /tmp/abc

$ hdfs dfs -ls /tmp | cat -v
Found 4 items
drwxrwxrwx   - hdfs   supergroup          0 2016-02-11 11:31 /tmp/.cloudera_health_monitoring_canary_files
-rw-r--r--   3 hdfs   supergroup        954 2016-02-11 11:30 /tmp/abc
drwx-wx-wx   - hive   supergroup          0 2016-01-11 12:10 /tmp/hive
drwxrwxrwt   - mapred hadoop              0 2016-01-11 12:08 /tmp/logs

 

 



David Wilder, Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

avatar
Community Manager

In my example I used -mv.  You would use -rmdir.

 

hdfs dfs -rmdir "/a/b/c/d//20160205^M"

 

Remember, to get "^M" type ctrl-v ctrl-m.



David Wilder, Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

avatar
Expert Contributor

Thanks Denloe for your response. I actually got one more doubt.

 

What if there is some other name following with after that ^M ? Should we need to use

 

hdfs dfs -rmdir "/a/b/c/d//20160205^Msomepart" or do we need to use some escape sequence for this?

 

Moreover when I press ctrl+v or ctrl+m the command is immediately executing (it is not allowing to type the second word)

Thanks,
Sathish (Satz)

avatar
Community Manager

The non-printable character may be located anywhere in the filename.  You just need to insert it in the appropriate location when quoting the filename.

 

Using ctrl-v to insert special characters is the default for the bash shell, but your terminal emulator (especially if you are coming in from Windows) may be catching it instead.

 

Try using shift-insert instead of ctrl-v.  If that fails, you may need to find an alternate method to embed control characters, such as use vi to create to bash script and insert them using vi.



David Wilder, Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

avatar
Mentor
I prefer using the simpler bash syntax of using special escaped characters,
if it helps:

We know that ^M is the same as \r, which makes sense if you used Windows
Notepad to write the commands but forgot to convert the file via dos2unix:

~> echo $'\x0d' | cat -v
^M
~> echo -n $'\x0d' | od -c
0000000 \r
0000002

(The \x0D or \x0d is the hex equivalent of \r, per
http://www.asciitable.com/ (carriage return))

Therefore, you can use the $'' syntax to write a string that includes the
escape:

~> hadoop fs -ls $'/a/b/c/d/20160206\r'
Or,
~> hadoop fs -ls $'/a/b/c/d/20160206\x0d'

This words well regardless of the terminal emulator you are using, cause
we're escaping based on representation vs. by reliance on the emulator
understanding the characters via input.

avatar
Explorer

There is a simple method to remove those.

 

1. List those directories inside a txt file like below

 

hadoop fs -ls /path > test

2. cat -t test will give you positions of duplicate with junk character

3. open another shell and just try to comment it # to identify exact ones

4. again cat -t the file to confirm u commented the culprits

5. remove original folder frm list

6. for i in `cat list`;

do hadoop fs -rmr $i;

done