Support Questions

Find answers, ask questions, and share your expertise

Impala and MultiDelimitSerDe

avatar
New Contributor

Hi,

 

    I've recently tried ran into an issue where we need to use multi delimited delimiter.

    In hive using the org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe serde works great.

 

    Data Sample:

mandt,description,systemid
090,no comma 01,10
090,this is a, test,10
090,we can see~1,d,10
090,comma,commacomma,,10
090,no comma 02,10

 

  Table created :

  

CREATE EXTERNAL TABLE `amt_multi`(
  `mandt` varchar(3) COMMENT 'from deserializer', 
  `description` varchar(200) COMMENT 'from deserializer', 
  `systemid` int COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' 
WITH SERDEPROPERTIES ( 
  'field.delim'='<|>', 
  'line.delim'='/n') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://hdfsha1/DEV/Raw_STAGING/Stg_GIS/multi'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='false', 
  'numFiles'='0', 
  'numRows'='-1', 
  'rawDataSize'='-1', 
  'skip.header.line.count'='1', 
  'totalSize'='0', 
  'transient_lastDdlTime'='1503183208')

 

    but when quering this same table from Impala, impala throws an error : 

    

  • AnalysisException: Failed to load metadata for table: 'amt_multi' CAUSED BY: TableLoadingException: Failed to load metadata for table: amt_multi CAUSED BY: InvalidStorageDescriptorException: Invalid delimiter: '<|>'. Delimiter must be specified as a single character or as a decimal value in the range [-128:127]

 

 So my question is can impala support multi character delimiter for text type data ? And if so how 

 does one do this.

 

Thanks

1 REPLY 1

avatar
Master Collaborator

Hi,

I have tested the same in my lab cluster, seems to be multi character delimiters will not works in impala.

 

Below is my testing when i tried with ',|'

 

Failing:

Query: Create table test ( name string, place string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',|'

STORED AS TEXTFILE

TBLPROPERTIES('serialization.null.format'='')

ERROR: AnalysisException: ESCAPED BY values and LINE/FIELD terminators must be specified as a single character or as a decimal value in the range [-128:127]: ,|

 

Created:

[host-1abc.cloudera.com:21000] default> Create table test ( name string, place string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'

                                                  > STORED AS TEXTFILE

                                                  > TBLPROPERTIES('serialization.null.format'='');

Query: Create table test ( name string, place string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'

STORED AS TEXTFILE

TBLPROPERTIES('serialization.null.format'='')

+-------------------------+

| summary                 |

+-------------------------+

| Table has been created. |

+-------------------------+

 

Created:

[host-1abc.cloudera.com:21000] default> Create table test1 ( name string, place string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

                                                  > STORED AS TEXTFILE

                                                  > TBLPROPERTIES('serialization.null.format'='');

Query: Create table test1 ( name string, place string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

STORED AS TEXTFILE

TBLPROPERTIES('serialization.null.format'='')

+-------------------------+

| summary                 |

+-------------------------+

| Table has been created. |

+-------------------------+