Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

How to handle trademark symbols in hive ?

Rising Star

I have requirement to handle file which contains special characters (like trademarks, non-utf and so on..)

1 ACCEPTED SOLUTION

Explorer

@Reddy, You need to specify serialization.encoding property along with LazySimpleSerDe while creating table to load non-utf formatted data.

Here is one example:

create table table_with_non_utf8_encoding (name STRING)  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1');

load data local inpath '../encoding-ISO8859_1.txt' overwrite into table table_with_non_utf8_encoding;

More details in this jira:

https://issues.apache.org/jira/browse/HIVE-7142

View solution in original post

3 REPLIES 3

Explorer

@Reddy, You need to specify serialization.encoding property along with LazySimpleSerDe while creating table to load non-utf formatted data.

Here is one example:

create table table_with_non_utf8_encoding (name STRING)  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1');

load data local inpath '../encoding-ISO8859_1.txt' overwrite into table table_with_non_utf8_encoding;

More details in this jira:

https://issues.apache.org/jira/browse/HIVE-7142

Rising Star

Yes, It is displaying the special characters with good reading format after adding serilization encoding property, however,while i am exporting the data to teradata with sqoop statement as using a connection manager i getting as non-readable characters in teradata. Attached is the screen shot(teradat.png). I suspect sqoop is not reconizing the special chracters correctly or do i need to use any specific teradata jar's while exporting the data ? I have attached the ingested data(after-ingestion-data-into-hadoop.png) and the showed the data in hive after adding encoding property(after-adding-encoding-to-hive-table.png), where as the same data is not same in Teradata. I would like to see the same type of characters in teradata as-well. Any Help appreciated.

12515-after-ingestion-data-into-hadoop.png

12516-after-adding-encoding-to-hive-table.png)

12517-teradata.png

Rising Star

I found a solution to export this kind of data to any RDBS in the form of UTF8 or any other character set by giving the specific character set after the database/host name.