Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to escape an double quotes in CSV file

avatar
New Contributor

I have text file like below :

1,"TEST"Data","SAMPLE DATA"

and the table structure is like this :

CREATE TABLE test1( id string, col1 string , col2 string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'mylocation/test1'

When I am putting the file in concerned HDFS location. 2nd and 03 rd column are populating as null that is because of the double quote in between (TEST"Data).

One way is to update the data file using escape character "/" but we are not allowed to update the incoming data. How can I load data properly and escape these embedded double quotes.

Appreciate the help !!

1 REPLY 1

avatar
Master Mentor

You can pass an escaped by clause

Enable escaping for the delimiter characters by using the 'ESCAPED BY' clause (such as ESCAPED BY '\') Escaping is needed if you want to work with data that can contain these delimiter characters.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

CREATE TABLE my_table(a string, b string, ...)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = "\t",
   "quoteChar"     = "'",
   "escapeChar"    = "\\"
)  
STORED AS TEXTFILE;
Default properties for SerDe is Comma-Separated (CSV) file
 
DEFAULT_ESCAPE_CHARACTER \