Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala and MultiDelimitSerDe

Impala and MultiDelimitSerDe

New Contributor

Hi,

 

    I've recently tried ran into an issue where we need to use multi delimited delimiter.

    In hive using the org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe serde works great.

 

    Data Sample:

mandt,description,systemid
090,no comma 01,10
090,this is a, test,10
090,we can see~1,d,10
090,comma,commacomma,,10
090,no comma 02,10

 

  Table created :

  

CREATE EXTERNAL TABLE `amt_multi`(
  `mandt` varchar(3) COMMENT 'from deserializer', 
  `description` varchar(200) COMMENT 'from deserializer', 
  `systemid` int COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' 
WITH SERDEPROPERTIES ( 
  'field.delim'='<|>', 
  'line.delim'='/n') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://hdfsha1/DEV/Raw_STAGING/Stg_GIS/multi'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='false', 
  'numFiles'='0', 
  'numRows'='-1', 
  'rawDataSize'='-1', 
  'skip.header.line.count'='1', 
  'totalSize'='0', 
  'transient_lastDdlTime'='1503183208')

 

    but when quering this same table from Impala, impala throws an error : 

    

  • AnalysisException: Failed to load metadata for table: 'amt_multi' CAUSED BY: TableLoadingException: Failed to load metadata for table: amt_multi CAUSED BY: InvalidStorageDescriptorException: Invalid delimiter: '<|>'. Delimiter must be specified as a single character or as a decimal value in the range [-128:127]

 

 So my question is can impala support multi character delimiter for text type data ? And if so how 

 does one do this.

 

Thanks