Reply
Explorer
Posts: 12
Registered: ‎05-26-2016

Loading files with Umlauts (üöäß) in names into HDFS.

I have a server with LANG=ENG.utf-8. that I cannot change.

And when I try to transfer a file from my local folder to hdfs I usually get an error.

Independently if it is with he command hdfs dfs -put (or -copyFromLocal) File_with_äöüß.pdf /hdfs/folder

How I can solve that ? With a .jar, python and flume.

Thanks in advance.

Posts: 1,903
Kudos: 435
Solutions: 307
Registered: ‎07-31-2013

Re: Loading files with Umlauts (üöäß) in names into HDFS.

Could you clarify further? Do you mean to indicate that the upload works with "hadoop fs" utilities but fails when done via code (FileSystem i/face)?
Explorer
Posts: 12
Registered: ‎05-26-2016

Re: Loading files with Umlauts (üöäß) in names into HDFS.

Imagine that on my local folder I have a file named Müller_Thomas.docx, and I need to transfer this file to my HDFS folders to be indexed by solr. When I give the command hdfs dfs -put Müller_Thomas.docx /user/solr/soccer to transfer the file it says that the file M"§$ller_Thomas.docx cannot be transfered. But when I try to transfer the same file without ü it transfer it without problems.

Now, I know this is this a problem with regarding utf-8 from my prompt that cannot read the uft-16 of the file.

So, how can I transfer this file to hdfs ?

Explorer
Posts: 12
Registered: ‎05-26-2016

Re: Loading files with Umlauts (üöäß) in names into HDFS.

Can someone please hep ?
Posts: 1,903
Kudos: 435
Solutions: 307
Registered: ‎07-31-2013

Re: Loading files with Umlauts (üöäß) in names into HDFS.

Have you tried escaping the character correctly in code? We don't actually care in the upload code what the file content is (its just a stream of bytes as far as HDFS is concerned, whatever be the encoding).

 

I cannot quite reproduce your issue with the fs utilities, or am I misunderstanding your statements:

 

# echo $LANG
en_US.UTF-8
# echo $'foo-üöäß-bar' > $'Müller_Thomas.docx'
# echo $'another-foo-üöäß-bar' > $'üöäß'
# hadoop fs -put $'Müller_Thomas.docx' /tmp/
# hadoop fs -put $'üöäß' /tmp/
# hadoop fs -ls /tmp/
…
-rw-r--r--   3 hdfs      supergroup        4 2016-06-19 22:49 /tmp/Müller_Thomas.docx
…
-rw-r--r--   3 hdfs      supergroup       25 2016-06-19 22:53 /tmp/üöäß
…
# hadoop fs -cat $'/tmp/üöäß'
another-foo-üöäß-bar
# hadoop fs -cat $'Müller_Thomas.docx'
foo-üöäß-bar
#