Reply
Highlighted
New Contributor
Posts: 2
Registered: ‎05-18-2017

External Hive + Avro table can't handle new line characters

I created an external table in hive with Avro as serde format but I found that if new line characters exist in the data hive returns incorrect results when executing any query that returns that information.

 

The specification of the table is:

 

CREATE TABLE IF NOT EXISTS  `employment`(
  `fid` int COMMENT '', 
  `name` string COMMENT ''
PARTITIONED BY ( 
  `year` string, 
  `month` string, 
  `day` string)
CLUSTERED BY ( 
  fid) 
INTO 100 BUCKETS
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION
  'hdfs:///workarea/productos/pf/db/origenes/employment';

The data is loaded through sqoop and in my test case the name column contains new line chars so when I execute the query:

SELECT  fid, name 
FROM employment WHERE year='2017' 
AND month='03' 
AND day='31' 
AND fid=123;

 I get these results:

fid name

  fid name

1123employment 1
2123employment
3NULLNULL

 

 

Instead of

  fid name

1123employment 1
2123employment 2

 

 

Assuming that the second row contains break line chars between the two words.

 

The same query on impala works fine.

 

And the solutions I've found related to this problem advice you to use binary formats such as SequenceFile or Avro, but that's what i'm already using.

 

And the other solution is to use some replacement function like  

regexp_replace(name,"\n","")

But that solution works on the specific case of a known column with the "problem", and maybe other columns or tables can have the same "problem" in the future.

 

So I think I'd be easy to recreate the table with the appropiate configuration than changing queries on several processes that use the emerging problematic table.

 

Is there any other option?

 

My CDH version is 5.8.0

 

Thanks.

Announcements