Support Questions

Find answers, ask questions, and share your expertise

Trouble loading CSV data with embedded double quotes and embedded commas into HIVE Table

avatar
New Contributor

I need to load the CSV data into hive table but i am facing issues with embedded double quotes in few column values as well embedded commas in other columns .
Because of this, wherever embedded double quotes and embedded commas are occured , the data from there not loading properly and filled with nulls.
I have tried using below openCSV serde options and all. But no luck!

 

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "\t",
"quoteChar" = "'",
"escapeChar" = "\\"
)

theCSV data looks like this

id,name,description,role
"1","peter","He is Data enginer"", "Senior Engineer"
"2","Anee","Hadoop Engineer"","Lead"
"3","James","Data, Architect"","Sr Architect"

 

hive data should look like this after loading

 

 

IdNameDescription    Role
1peterHe is Data EngineerSenior Engineer
2aneeHadoop EngineerLead
3jamesData, ArchitectSr Architect

 

1 ACCEPTED SOLUTION

avatar
Guru

@cortland 

 

I am able to acheive,please find my testcase

abc.txt
====

"1","peter","He is Data enginer", "Senior Engineer"
"2","Anee","Hadoop Engineer","Lead"
"3","James","Data, Architect","Sr Architect"


hdfs dfs -put abc.txt /user/hive/


create external table test_csv(num string ,name string ,work string ,designation string )
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties
(
"separatorChar" = ','
,"quoteChar" = '"'
)
STORED AS TEXTFILE;

 

LOAD DATA INPATH '/user/hive/abc.txt' INTO TABLE test_csv;

 

select * from test_csv
+---------------+----------------+---------------------+-----------------------+
| test_csv.num | test_csv.name | test_csv.work | test_csv.designation |
+---------------+----------------+---------------------+-----------------------+
| 1 | peter | He is Data enginer | Senior Engineer |
| 2 | Anee | Hadoop Engineer | Lead |
| 3 | James | Data, Architect | Sr Architect |
| 1 | peter | He is Data enginer | Senior Engineer |
| 2 | Anee | Hadoop Engineer | Lead |
| 3 | James | Data, Architect | Sr Architect |
+---------------+----------------+---------------------+-----------------------+

 

Please accept it as solution,if your queries are answered and the testcase works in your scenario.

View solution in original post

3 REPLIES 3

avatar
Guru

@cortland 

 

I am able to acheive,please find my testcase

abc.txt
====

"1","peter","He is Data enginer", "Senior Engineer"
"2","Anee","Hadoop Engineer","Lead"
"3","James","Data, Architect","Sr Architect"


hdfs dfs -put abc.txt /user/hive/


create external table test_csv(num string ,name string ,work string ,designation string )
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties
(
"separatorChar" = ','
,"quoteChar" = '"'
)
STORED AS TEXTFILE;

 

LOAD DATA INPATH '/user/hive/abc.txt' INTO TABLE test_csv;

 

select * from test_csv
+---------------+----------------+---------------------+-----------------------+
| test_csv.num | test_csv.name | test_csv.work | test_csv.designation |
+---------------+----------------+---------------------+-----------------------+
| 1 | peter | He is Data enginer | Senior Engineer |
| 2 | Anee | Hadoop Engineer | Lead |
| 3 | James | Data, Architect | Sr Architect |
| 1 | peter | He is Data enginer | Senior Engineer |
| 2 | Anee | Hadoop Engineer | Lead |
| 3 | James | Data, Architect | Sr Architect |
+---------------+----------------+---------------------+-----------------------+

 

Please accept it as solution,if your queries are answered and the testcase works in your scenario.

avatar
New Contributor

Thanks for your response Ashish,

 

If you see my test data , its having two double quotes (extra double quote ) in column values . thats where its creating problems.

 

"1","peter","He is Data enginer"", "Senior Engineer"
"2","Anee","Hadoop Engineer"","Lead"
"3","James","Data, Architect"","Sr Architect"

avatar
Guru

yes that was indeed a problem 🙂

 

I was about to comment on that