Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Unable to view/see the filename correctly when storing Filename with chinese characters in HDFS

avatar
Rising Star

I am trying to put a file in hadoop with the filename in chinese characters.

file: 余宗阳视频审核稿-1024.docx

but the file name is looking vaguely in hadoop as Óà×ÚÑôÊÓƵÉóºË¸å-1024.docx

Any hints to solve this issue?

1 ACCEPTED SOLUTION

avatar
Master Collaborator

@Gayathri Reddy G

As such we didn't need any specific header other than content-type and charset which you already mentioned in your above command. I tried to replicate same command but i can able to write file in hdfs using curl via webhdfs

93346-hdfs-min.png

Seems like you have some space in the path. Please can you verify your command again ?

Reference: https://hadoop.apache.org/docs/r1.0.4/webhdfs.html

View solution in original post

7 REPLIES 7

avatar
Master Collaborator

@Gayathri Reddy G

Check your locale on your terminal, if on Linux check "echo $LANG", does it end in UTF-8? You can store in HDFS any data, it only depends how are you going to interpret it for display. HDFS by default supports UTF-8, but can read other encodings as well. Most of Hadoop ecosystem uses UTF-8. Below I tried to replicate this issue but I can see filename in Chinese characters.

93323-file.png

avatar
Rising Star

@Jagadeesan A S

That's working thanks!

I am trying to put the same file to hdfs using Curl via webhdfs and getting error ">HTTP Status 500 - Illegal character in path at index "

curl -i -H 'content-type:application/octet-stream' -H 'charset:UTF-8' -X PUT -T '余宗阳视频审核稿-1024.docx' 'http://hostname:14000/webhdfs/v1/user/username/余宗阳视频审核稿-1024.docx?op=CREATE&data=true&user.name=username&overwrite=true'

Any other header to be passed to recognize the chinese character here?

avatar
Master Collaborator

@Gayathri Reddy G

As such we didn't need any specific header other than content-type and charset which you already mentioned in your above command. I tried to replicate same command but i can able to write file in hdfs using curl via webhdfs

93346-hdfs-min.png

Seems like you have some space in the path. Please can you verify your command again ?

Reference: https://hadoop.apache.org/docs/r1.0.4/webhdfs.html

avatar
Rising Star

After encoding it is working for me. But the first command, it's still throwing the error "illegal character found at index 62". 62 is where the filename will start in the destination path.

i checked the $LANG, and it is UTF-8.

What was the exact output for you when executed the first curl without encoding?

avatar
Master Collaborator

@Gayathri Reddy G

Step 1: We need to submit first HTTP PUT request, that will give path for TEMPORARY_LOCATION to some random datanode path location where that data will going to write.

93347-webhdfs-first.png


Step 2: Again submit another HTTP PUT request using the URL in the TEMPORARY_LOCATION header with the file data to be written.
93348-webhdfs-second.png

The client receives a 201 Created response with zero content length and the WebHDFS URI of the file in the Location header.

Please accept the answer you found most useful

avatar
Rising Star

I found the root cause of the issue. Should use namenode with 50070 port. I was using edge node and hence the failure.

Thanks!

avatar
Rising Star

using PUT command, need to submit the curl twice. There is "negotiate" curl command which does the same in single submission.

curl --negotiate -u : -L "http://namenode:50070/webhdfs/v1/user/username/余宗阳视频审核稿-1024.docx?op=CREATE&user.name=username" -T 余宗阳视频审核稿-1024.docx