Created on 02-11-2016 07:24 AM - edited 09-16-2022 03:03 AM
Hi All,
Our application team created hdfs directories with below script.
hadoop fs -mkdir /a/b/c/d/20160208
hadoop fs -mkdir /a/b/c/d/20160208/s
hadoop fs -mkdir /a/b/c/d/20160208/s/inputmap
hadoop fs -mkdir /a/b/c/d/20160208/s/temp
hadoop fs -mkdir /a/b/c/d/20160208/s/map
hadoop fs -mkdir /a/b/c/d/20160208/s/input
hadoop fs -copyFromLocal /x/y/z/20160208.dat /a/b/c/d/20160208/s/inputmap
echo "Setup Complete"
The directories got creted but it throws error if we try to access it.
hdfs@hostname$ hadoop fs -ls /a/b/c/d/
Found 20 items
drwxr-xr-x - user group 0 2016-01-27 09:10 /a/b/c/d/20141211
drwxr-xr-x - user group 0 2016-01-06 01:03 /a/b/c/d/20141212
drwxr-xr-x - user group 0 2016-01-06 01:09 /a/b/c/d/20141213
drwxr-xr-x - user group 0 2015-11-12 08:53 /a/b/c/d/20151106
drwxr-xr-x - user group 0 2016-01-12 01:48 /a/b/c/d/20151118
drwxr-xr-x - user group 0 2015-12-04 04:21 /a/b/c/d/20151130
drwxrwxr-x - user group 0 2016-01-12 10:48 /a/b/c/d/20151221
drwxr-xr-x - user group 0 2016-01-19 11:23 /a/b/c/d/20160111
drwxr-xr-x - user group 0 2016-01-27 14:56 /a/b/c/d/20160112
drwxr-xr-x - user group 0 2016-02-02 16:12 /a/b/c/d/20160125
drwxr-xr-x - user group 0 2016-02-08 12:41 /a/b/c/d/20160126
drwxr-xr-x - user group 0 2016-02-08 10:26 /a/b/c/d/20160127
drwxr-xr-x - user group 0 2016-01-29 10:48 /a/b/c/d/20160129
drwxr-xr-x - user group 0 2016-02-09 02:43 /a/b/c/d/20160203
drwxr-xr-x - user group 0 2016-02-09 02:42 /a/b/c/d/20160204
drwxr-xr-x - user group 0 2016-02-08 15:38 /a/b/c/d/20160205
drwxr-xr-x - user group 0 2016-02-08 09:02 /a/b/c/d/20160205
drwxr-xr-x - user group 0 2016-02-08 07:00 /a/b/c/d/20160206
drwxr-xr-x - user group 0 2016-02-09 17:11 /a/b/c/d/20160208
drwxr-xr-x - user group 0 2016-02-08 11:07 /a/b/c/d/20160208
hdfs@hostname$ hadoop fs -ls /a/b/c/d/20160206
ls: `/a/b/c/d/20160206': No such file or directory
when we did | cat -v along with "ls" we came to know that some special character got inserted in the directory name as below.
hdfs@hostname$ hadoop fs -ls /a/b/c/d/ | cat -v
Found 20 items
drwxr-xr-x - user group 0 2016-01-27 09:10 /a/b/c/d//20141211
drwxr-xr-x - user group 0 2016-01-06 01:03 /a/b/c/d//20141212
drwxr-xr-x - user group 0 2016-01-06 01:09 /a/b/c/d//20141213
drwxr-xr-x - user group 0 2015-11-12 08:53 /a/b/c/d//20151106
drwxr-xr-x - user group 0 2016-01-12 01:48 /a/b/c/d//20151118
drwxr-xr-x - user group 0 2015-12-04 04:21 /a/b/c/d//20151130
drwxrwxr-x - user group 0 2016-01-12 10:48 /a/b/c/d//20151221
drwxr-xr-x - user group 0 2016-01-19 11:23 /a/b/c/d//20160111
drwxr-xr-x - user group 0 2016-01-27 14:56 /a/b/c/d//20160112
drwxr-xr-x - user group 0 2016-02-02 16:12 /a/b/c/d//20160125
drwxr-xr-x - user group 0 2016-02-08 12:41 /a/b/c/d//20160126
drwxr-xr-x - user group 0 2016-02-08 10:26 /a/b/c/d//20160127
drwxr-xr-x - user group 0 2016-01-29 10:48 /a/b/c/d//20160129
drwxr-xr-x - user group 0 2016-02-09 02:43 /a/b/c/d//20160203
drwxr-xr-x - user group 0 2016-02-09 02:42 /a/b/c/d//20160204
drwxr-xr-x - user group 0 2016-02-08 15:38 /a/b/c/d//20160205
drwxr-xr-x - user group 0 2016-02-08 09:02 /a/b/c/d//20160205^M
drwxr-xr-x - user group 0 2016-02-08 07:00 /a/b/c/d//20160206^M
drwxr-xr-x - user group 0 2016-02-09 17:11 /a/b/c/d//20160208
drwxr-xr-x - user group 0 2016-02-08 11:07 /a/b/c/d//20160208^M
Now i want to delete these duplicate entries, can anyone help me with this.
Thanks
Srini
Created 02-11-2016 09:29 PM
Created 02-11-2016 08:40 AM
You would handle this in the same way if the issue occurred on a linux filesystem. Use quotes around the filename and ctrl-v to insert the special characters.
In this case, I type ctrl-v then ctrl-m to insert ^M into my strings.
$ hdfs dfs -put /etc/group "/tmp/abc^M" $ hdfs dfs -ls /tmp Found 4 items drwxrwxrwx - hdfs supergroup 0 2016-02-11 11:29 /tmp/.cloudera_health_monitoring_canary_files -rw-r--r-- 3 hdfs supergroup 954 2016-02-11 11:30 /tmp/abc drwx-wx-wx - hive supergroup 0 2016-01-11 12:10 /tmp/hive drwxrwxrwt - mapred hadoop 0 2016-01-11 12:08 /tmp/logs $ hdfs dfs -ls /tmp | cat -v Found 4 items drwxrwxrwx - hdfs supergroup 0 2016-02-11 11:30 /tmp/.cloudera_health_monitoring_canary_files -rw-r--r-- 3 hdfs supergroup 954 2016-02-11 11:30 /tmp/abc^M drwx-wx-wx - hive supergroup 0 2016-01-11 12:10 /tmp/hive drwxrwxrwt - mapred hadoop 0 2016-01-11 12:08 /tmp/logs $ hdfs dfs -mv "/tmp/abc^M" /tmp/abc $ hdfs dfs -ls /tmp | cat -v Found 4 items drwxrwxrwx - hdfs supergroup 0 2016-02-11 11:31 /tmp/.cloudera_health_monitoring_canary_files -rw-r--r-- 3 hdfs supergroup 954 2016-02-11 11:30 /tmp/abc drwx-wx-wx - hive supergroup 0 2016-01-11 12:10 /tmp/hive drwxrwxrwt - mapred hadoop 0 2016-01-11 12:08 /tmp/logs
David Wilder, Community Manager
Created 02-11-2016 08:48 AM
In my example I used -mv. You would use -rmdir.
hdfs dfs -rmdir "/a/b/c/d//20160205^M"
Remember, to get "^M" type ctrl-v ctrl-m.
David Wilder, Community Manager
Created on 02-11-2016 09:01 AM - edited 02-11-2016 09:23 AM
Thanks Denloe for your response. I actually got one more doubt.
What if there is some other name following with after that ^M ? Should we need to use
hdfs dfs -rmdir "/a/b/c/d//20160205^Msomepart" or do we need to use some escape sequence for this?
Moreover when I press ctrl+v or ctrl+m the command is immediately executing (it is not allowing to type the second word)
Created 02-11-2016 11:21 AM
The non-printable character may be located anywhere in the filename. You just need to insert it in the appropriate location when quoting the filename.
Using ctrl-v to insert special characters is the default for the bash shell, but your terminal emulator (especially if you are coming in from Windows) may be catching it instead.
Try using shift-insert instead of ctrl-v. If that fails, you may need to find an alternate method to embed control characters, such as use vi to create to bash script and insert them using vi.
David Wilder, Community Manager
Created 02-11-2016 09:29 PM
Created 10-24-2017 12:12 AM
There is a simple method to remove those.
1. List those directories inside a txt file like below
hadoop fs -ls /path > test
2. cat -t test will give you positions of duplicate with junk character
3. open another shell and just try to comment it # to identify exact ones
4. again cat -t the file to confirm u commented the culprits
5. remove original folder frm list
6. for i in `cat list`;
do hadoop fs -rmr $i;
done