Created 08-12-2021 05:39 AM
I have a csv data which I have to load in impala/hive. does anyone knows how to remove the double quotation mark in the output? Here is my sample create table scripts.
CREATE EXTERNAL TABLE schema.table ( id int, name STRING, desc STRING, desc1 STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' WITH SERDEPROPERTIES ('field.delim'='|', 'serialization.format'='|') STORED AS TEXTFILE LOCATION '/user/folder/schema/table'
Created 08-12-2021 05:21 PM
Hi,
Can you provide the sample output for the same where you are seeing the double quotation mark?
Regards,
Chethan YM
Created 08-12-2021 05:44 PM
Hello,
This is the sample output:
Created 08-12-2021 05:57 PM
Hi @lalala ,
This behavior was caused by the csv module when impala is using it to export the data.
# csv.writer expects a file handle to the input.
# cStringIO is used as the temporary buffer.
temp_buffer = StringIO()
writer = csv.writer(temp_buffer, delimiter=self.field_delim,
lineterminator='\n', quoting=csv.QUOTE_MINIMAL)
writer.writerows(rows)
Seems to be we cannot change this since it is needs to modified at code level.
[1].
Created 08-12-2021 09:30 PM
1. in Hive we can use OpenCSVSerde to remove the double quotation marks, however this is not compatible with Impala. is there any equivalent OpenCSVSerde in Impala?
2.ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '\\' WITH SERDEPROPERTIES ( "separatorChar" = "|" , "quoteChar" = "\"" ) is not also working.
Created on 12-21-2021 07:57 AM - edited 12-21-2021 08:31 AM
Hello @lalala @ChethanYM did you find any solution to remove double-quote in impala output on a csv external file table.
Input external .gz file has row:
true,false,"US","Los Angeles","California","Cloudflare"
Output of select * from impala external table defined on .gz file above shows:
true false "US" "Los Angeles" "California" "Cloudflare"
FYI in AWS Athena doc it says:
https://docs.aws.amazon.com/athena/latest/ug/lazy-simple-serde.html
LazySimpleSerDe for CSV, TSV, and Custom-Delimited Files
Use this SerDe if your data does not have values enclosed in quotes.
How to remove the double-quotes.
Created on 12-21-2021 09:51 AM - edited 12-21-2021 09:52 AM
UPDATE:
One possible workaround to suppress these quotes from displaying in select * is to create a view like below in Impala: