Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to remove double quote from csv file ?

avatar
New Contributor

I m loading csv file into Hive orc table using data frame temporary table. After loading into Hive table data is present with double quote.

Input file

"Arpit","Jain",123

"Qwee","ffhh",5778

How to remove this double quote at time of inserting into Hive table which induce by csv format .

,

I m loading csv file to orc Hive table using data frame temporary table.

But in Hive table it's loaded with double quote.

How can I remove double quotes .

Input csv file in hdfs

"Arpit","Jain",1234,"India"

"ABC","abcd",7657,"India"

,

1 ACCEPTED SOLUTION

avatar
Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
5 REPLIES 5

avatar
Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Super Guru

@Arpit Jain

When you create table as select ... into ORC table don't forget the cast the proper data type to match your target table. Some of the fields may get converted implicitly, others not.

avatar
Contributor

doesn't work here, full script is as below:

 

 

 

CREATE TABLE sr.sr2013 ( 
creation_date STRING,   
status STRING,   
first_3_chars_of_postal_code STRING,   
intersection_street_1 STRING,   
intersection_street_2 STRING,   
ward STRING,   
service_request_type STRING,   
division STRING,   
section STRING ) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
WITH SERDEPROPERTIES (
'colelction.delim'='\u0002', 
'mapkey.delim'='\u0003', 
'serialization.format'=',', 
'field.delim'=',', 
'skip.header.line.count'='1',
'quoteChar'= "\"") ;

 

 

 

avatar
Contributor

Impala rejected the change of:

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

avatar
Expert Contributor
Impala doesnt support the ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' even in newer version like v3.4.0. Any other option to remove double quotes in the output from Impala where the input csv file has quotes?