hi i am trying to insert urdu data in hive using odbc driver,but it converts it into junk characters , how can i save urdu language data in hive using hdfs , , thanks
I have no experience with urdu, but the creation of junk characters is typically a result of either using encodings on the target not supporting your characters or interpreting the source as the wrong encoding.
In your case, hive should use utf-8 by default (which supports urdu), but it is possible that the encoding was changed when the hive table was created. Can you also verify in the client session (where you want to insert data into hive), the session encoding is configured correctly?
And how do you determine the junk characters? Did you run a query on hive? Using what client tool, and how is the encoding defined there (utf-8 should be the correct one).
hi harald thanks for you responce
yes default encoding is utf8 , i am running insert query on hadoop server =>hive(opened in terminal) , on ubantu machine
can you check what the output on your ubuntu terminal of 'locale' is? On my machine it looks like this:
$ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= $
not sure if you have a source file that you want to store on hive, but if there is a file, you can check the assumed encoding with
file -i <your file>