I want to land files with Chinese characters in HDFS. I assume this should not problem on hdfs. Is there any issues I should be aware of during processing these files or reporting on files?
I don't see any problems from HDFS side. MR is using utf-8 for writing text. If the user is using other encoding, she has to extend the input/output format.
View solution in original post