Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Proccesing Fixed width file in hive with special charachters

Proccesing Fixed width file in hive with special charachters

New Contributor

I am proccessing fix width files using RegexSerDe. I am able to succesfully process all the files except the files contains special charachters(e.g. ^M, The characters in which enchoding is there).

 

Please find below records

 

P098206452021632026    20160114214137ETP45300094 aDcLéVEIk36548                                             00 6224c1c7c8bc26be42db5cd5683877c29438f639b2b17d441      0a854f2a316f70VI110000000021632026             365.48N2      01    8699CANCAN                      6         0.00                45201139660000000021632026CAN.DEN      T. ASSOC/ADC           0000000021632026   07      0           KQL2I

P098206452021632026    20160114214221ETP45300094 aDcLéVEIk36548                                             00 6224c1c7c8bc26be42db5cd5683877c29438f639b2b17d441      0a854f2a316f70VI110000000021632026             365.48N2      01    8699CANCAN                      6         0.00                45201139660000000021632026CAN.DEN      T. ASSOC/ADC           0000000021632026   07      0           KQL2I

 

 

In above record I am getting ÃÂ special character which I want to handle in hive. If this kind of record available in file, my serde table get loaded with null.

Is there any solution to handle these kind of records

 

Thanks,

Shivaji

1 REPLY 1
Highlighted

Re: Proccesing Fixed width file in hive with special charachters

New Contributor

Hi Shivaji,

 

You can try something like below steps-
1. create a temp table with single column of datatype as binary
2. load the file into this temp table
3. create your master table same a your temp table but depending on the number of coulmn you need using datatype as binary for all columns
4. using the 'substring' function insert data from temp table into your master table

 

CREATE TABLE test_special_char_tmp(col1 binary)
LOCATION '/tmp/test_special_char_tmp';

 

LOAD DATA INPATH '/tmp/test_special_char.txt' INTO TABLE test_special_char_tmp;

 

CREATE TABLE test_special_char_master(col1 binary, col2 binary, col3 binary, col4 binary)
LOCATION '/tmp/test_special_char_master';

 

insert into test_special_char_master
select substring(a.col1, 1, 23), substring(a.col1, 24, 26), substring(a.col1, 50, 61), substring(a.col1, 111, 2) from test_special_char_tmp a;

 

Just a workaround. Let me know if you have found any better solution.

 

Good luck,
Venkat M S