Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Processing Fixed Width Files in Hive Using Native (Non-UTF8) Character Sets

Solved Go to solution
Highlighted

Processing Fixed Width Files in Hive Using Native (Non-UTF8) Character Sets

New Contributor

Hi,

I have a requirement to load Fixed Width file in hive table where input file is not always UTF-8 encoded.

I found 2 different classes are available for this - 'org.apache.hadoop.hive.serde2.RegexSerDe' to read from fixed width file on defined offset values and 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' for non utf8 encoding. But unable to use them together when creating external table.

Can someone of you please help me with a solution. Thanks in advance!!

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Processing Fixed Width Files in Hive Using Native (Non-UTF8) Character Sets

Rising Star

I would just read the table with the LazySimpleSerDe and use the substr() function to extract out the columns. I've found that to be more performant than the RegexSerDe and it's clearer to read. You can either run the substring query directly or put it in a view.

View solution in original post

2 REPLIES 2

Re: Processing Fixed Width Files in Hive Using Native (Non-UTF8) Character Sets

Rising Star

I would just read the table with the LazySimpleSerDe and use the substr() function to extract out the columns. I've found that to be more performant than the RegexSerDe and it's clearer to read. You can either run the substring query directly or put it in a view.

View solution in original post

Highlighted

Re: Processing Fixed Width Files in Hive Using Native (Non-UTF8) Character Sets

New Contributor

Thank you Shawn for your prompt response. I found an alternate way. Did UTF-8 conversion using iconv before reading in external table with RegexSerDe. In my case Hive by default supports UTF-8 charactersets.

Don't have an account?
Coming from Hortonworks? Activate your account here