Created on 08-20-2017 08:28 AM - edited 09-16-2022 05:07 AM
Hi,
I've recently tried ran into an issue where we need to use multi delimited delimiter.
In hive using the org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe serde works great.
Data Sample:
mandt,description,systemid 090,no comma 01,10 090,this is a, test,10 090,we can see~1,d,10 090,comma,commacomma,,10 090,no comma 02,10
Table created :
CREATE EXTERNAL TABLE `amt_multi`( `mandt` varchar(3) COMMENT 'from deserializer', `description` varchar(200) COMMENT 'from deserializer', `systemid` int COMMENT 'from deserializer') ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ( 'field.delim'='<|>', 'line.delim'='/n') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://hdfsha1/DEV/Raw_STAGING/Stg_GIS/multi' TBLPROPERTIES ( 'COLUMN_STATS_ACCURATE'='false', 'numFiles'='0', 'numRows'='-1', 'rawDataSize'='-1', 'skip.header.line.count'='1', 'totalSize'='0', 'transient_lastDdlTime'='1503183208')
but when quering this same table from Impala, impala throws an error :
So my question is can impala support multi character delimiter for text type data ? And if so how
does one do this.
Thanks
Created 07-28-2021 07:31 PM
Hi,
I have tested the same in my lab cluster, seems to be multi character delimiters will not works in impala.
Below is my testing when i tried with ',|'
Failing:
Query: Create table test ( name string, place string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',|'
STORED AS TEXTFILE
TBLPROPERTIES('serialization.null.format'='')
ERROR: AnalysisException: ESCAPED BY values and LINE/FIELD terminators must be specified as a single character or as a decimal value in the range [-128:127]: ,|
Created:
[host-1abc.cloudera.com:21000] default> Create table test ( name string, place string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
> STORED AS TEXTFILE
> TBLPROPERTIES('serialization.null.format'='');
Query: Create table test ( name string, place string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
TBLPROPERTIES('serialization.null.format'='')
+-------------------------+
| summary |
+-------------------------+
| Table has been created. |
+-------------------------+
Created:
[host-1abc.cloudera.com:21000] default> Create table test1 ( name string, place string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES('serialization.null.format'='');
Query: Create table test1 ( name string, place string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
TBLPROPERTIES('serialization.null.format'='')
+-------------------------+
| summary |
+-------------------------+
| Table has been created. |
+-------------------------+