Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Characters not being handled correctly

Characters not being handled correctly

New Contributor

I am facing a case of data discrepancy/mismatch. There are some printable characters in the source table which after import to hive tables are getting converted to a "question mark(?)" I have tried the following options so far:

  • "serialization.encoding"='ISO-8859-1')
    TBLPROPERTIES ( 'store.charset'='ISO-8859-1',
    'retrieve.charset'='ISO-8859-1');
  • 'serialization.encoding'='UTF-8'

However, the issue isnt getting resolved. Please note that the file which I am trying to import to my hive table has been brought via ndm(mainframe file) to hdfs and is in binary format. The framework converts the binary to readable formats internally. Is there anything else I can try ? Please suggest if anyone has come across such a scenario . Thanks in advance.

Don't have an account?
Coming from Hortonworks? Activate your account here