Reply
New Contributor
Posts: 2
Registered: ‎08-20-2017

Impala and MultiDelimitSerDe

Hi,

 

    I've recently tried ran into an issue where we need to use multi delimited delimiter.

    In hive using the org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe serde works great.

 

    Data Sample:

mandt,description,systemid
090,no comma 01,10
090,this is a, test,10
090,we can see~1,d,10
090,comma,commacomma,,10
090,no comma 02,10

 

  Table created :

  

CREATE EXTERNAL TABLE `amt_multi`(
  `mandt` varchar(3) COMMENT 'from deserializer', 
  `description` varchar(200) COMMENT 'from deserializer', 
  `systemid` int COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' 
WITH SERDEPROPERTIES ( 
  'field.delim'='<|>', 
  'line.delim'='/n') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://hdfsha1/DEV/Raw_STAGING/Stg_GIS/multi'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='false', 
  'numFiles'='0', 
  'numRows'='-1', 
  'rawDataSize'='-1', 
  'skip.header.line.count'='1', 
  'totalSize'='0', 
  'transient_lastDdlTime'='1503183208')

 

    but when quering this same table from Impala, impala throws an error : 

    

  • AnalysisException: Failed to load metadata for table: 'amt_multi' CAUSED BY: TableLoadingException: Failed to load metadata for table: amt_multi CAUSED BY: InvalidStorageDescriptorException: Invalid delimiter: '<|>'. Delimiter must be specified as a single character or as a decimal value in the range [-128:127]

 

 So my question is can impala support multi character delimiter for text type data ? And if so how 

 does one do this.

 

Thanks

Announcements