I have a server with LANG=ENG.utf-8. that I cannot change.
And when I try to transfer a file from my local folder to hdfs I usually get an error.
Independently if it is with he command hdfs dfs -put (or -copyFromLocal) File_with_äöüß.pdf /hdfs/folder
How I can solve that ? With a .jar, python and flume.
Thanks in advance.
Imagine that on my local folder I have a file named Müller_Thomas.docx, and I need to transfer this file to my HDFS folders to be indexed by solr. When I give the command hdfs dfs -put Müller_Thomas.docx /user/solr/soccer to transfer the file it says that the file M"§$ller_Thomas.docx cannot be transfered. But when I try to transfer the same file without ü it transfer it without problems.
Now, I know this is this a problem with regarding utf-8 from my prompt that cannot read the uft-16 of the file.
So, how can I transfer this file to hdfs ?
Have you tried escaping the character correctly in code? We don't actually care in the upload code what the file content is (its just a stream of bytes as far as HDFS is concerned, whatever be the encoding).
I cannot quite reproduce your issue with the fs utilities, or am I misunderstanding your statements:
# echo $LANG en_US.UTF-8 # echo $'foo-üöäß-bar' > $'Müller_Thomas.docx' # echo $'another-foo-üöäß-bar' > $'üöäß' # hadoop fs -put $'Müller_Thomas.docx' /tmp/ # hadoop fs -put $'üöäß' /tmp/ # hadoop fs -ls /tmp/ … -rw-r--r-- 3 hdfs supergroup 4 2016-06-19 22:49 /tmp/Müller_Thomas.docx … -rw-r--r-- 3 hdfs supergroup 25 2016-06-19 22:53 /tmp/üöäß … # hadoop fs -cat $'/tmp/üöäß' another-foo-üöäß-bar # hadoop fs -cat $'Müller_Thomas.docx' foo-üöäß-bar #