Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to handle trademark symbols in hive ?

avatar
Expert Contributor

I have requirement to handle file which contains special characters (like trademarks, non-utf and so on..)

1 ACCEPTED SOLUTION

avatar
Contributor

@Reddy, You need to specify serialization.encoding property along with LazySimpleSerDe while creating table to load non-utf formatted data.

Here is one example:

create table table_with_non_utf8_encoding (name STRING)  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1');

load data local inpath '../encoding-ISO8859_1.txt' overwrite into table table_with_non_utf8_encoding;

More details in this jira:

https://issues.apache.org/jira/browse/HIVE-7142

View solution in original post

3 REPLIES 3

avatar
Contributor

@Reddy, You need to specify serialization.encoding property along with LazySimpleSerDe while creating table to load non-utf formatted data.

Here is one example:

create table table_with_non_utf8_encoding (name STRING)  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1');

load data local inpath '../encoding-ISO8859_1.txt' overwrite into table table_with_non_utf8_encoding;

More details in this jira:

https://issues.apache.org/jira/browse/HIVE-7142

avatar
Expert Contributor

Yes, It is displaying the special characters with good reading format after adding serilization encoding property, however,while i am exporting the data to teradata with sqoop statement as using a connection manager i getting as non-readable characters in teradata. Attached is the screen shot(teradat.png). I suspect sqoop is not reconizing the special chracters correctly or do i need to use any specific teradata jar's while exporting the data ? I have attached the ingested data(after-ingestion-data-into-hadoop.png) and the showed the data in hive after adding encoding property(after-adding-encoding-to-hive-table.png), where as the same data is not same in Teradata. I would like to see the same type of characters in teradata as-well. Any Help appreciated.

12515-after-ingestion-data-into-hadoop.png

12516-after-adding-encoding-to-hive-table.png)

12517-teradata.png

avatar
Expert Contributor

I found a solution to export this kind of data to any RDBS in the form of UTF8 or any other character set by giving the specific character set after the database/host name.