Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How to handle trademark symbols in hive ?

avatar
Expert Contributor

I have requirement to handle file which contains special characters (like trademarks, non-utf and so on..)

1 ACCEPTED SOLUTION

avatar
New Member

@Reddy, You need to specify serialization.encoding property along with LazySimpleSerDe while creating table to load non-utf formatted data.

Here is one example:

create table table_with_non_utf8_encoding (name STRING)  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1');

load data local inpath '../encoding-ISO8859_1.txt' overwrite into table table_with_non_utf8_encoding;

More details in this jira:

https://issues.apache.org/jira/browse/HIVE-7142

View solution in original post

3 REPLIES 3

avatar
New Member

@Reddy, You need to specify serialization.encoding property along with LazySimpleSerDe while creating table to load non-utf formatted data.

Here is one example:

create table table_with_non_utf8_encoding (name STRING)  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1');

load data local inpath '../encoding-ISO8859_1.txt' overwrite into table table_with_non_utf8_encoding;

More details in this jira:

https://issues.apache.org/jira/browse/HIVE-7142

avatar
Expert Contributor

Yes, It is displaying the special characters with good reading format after adding serilization encoding property, however,while i am exporting the data to teradata with sqoop statement as using a connection manager i getting as non-readable characters in teradata. Attached is the screen shot(teradat.png). I suspect sqoop is not reconizing the special chracters correctly or do i need to use any specific teradata jar's while exporting the data ? I have attached the ingested data(after-ingestion-data-into-hadoop.png) and the showed the data in hive after adding encoding property(after-adding-encoding-to-hive-table.png), where as the same data is not same in Teradata. I would like to see the same type of characters in teradata as-well. Any Help appreciated.

12515-after-ingestion-data-into-hadoop.png

12516-after-adding-encoding-to-hive-table.png)

12517-teradata.png

avatar
Expert Contributor

I found a solution to export this kind of data to any RDBS in the form of UTF8 or any other character set by giving the specific character set after the database/host name.