- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to handle trademark symbols in hive ?
- Labels:
-
Apache Hive
Created ‎02-14-2017 07:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have requirement to handle file which contains special characters (like trademarks, non-utf and so on..)
Created ‎02-15-2017 08:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Reddy, You need to specify serialization.encoding property along with LazySimpleSerDe while creating table to load non-utf formatted data.
Here is one example:
create table table_with_non_utf8_encoding (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1'); load data local inpath '../encoding-ISO8859_1.txt' overwrite into table table_with_non_utf8_encoding;
More details in this jira:
Created ‎02-15-2017 08:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Reddy, You need to specify serialization.encoding property along with LazySimpleSerDe while creating table to load non-utf formatted data.
Here is one example:
create table table_with_non_utf8_encoding (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1'); load data local inpath '../encoding-ISO8859_1.txt' overwrite into table table_with_non_utf8_encoding;
More details in this jira:
Created on ‎02-15-2017 11:40 PM - edited ‎08-19-2019 05:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, It is displaying the special characters with good reading format after adding serilization encoding property, however,while i am exporting the data to teradata with sqoop statement as using a connection manager i getting as non-readable characters in teradata. Attached is the screen shot(teradat.png). I suspect sqoop is not reconizing the special chracters correctly or do i need to use any specific teradata jar's while exporting the data ? I have attached the ingested data(after-ingestion-data-into-hadoop.png) and the showed the data in hive after adding encoding property(after-adding-encoding-to-hive-table.png), where as the same data is not same in Teradata. I would like to see the same type of characters in teradata as-well. Any Help appreciated.
)
Created ‎02-16-2017 03:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found a solution to export this kind of data to any RDBS in the form of UTF8 or any other character set by giving the specific character set after the database/host name.
