- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Trouble loading CSV data with embedded double quotes and embedded commas into HIVE Table
- Labels:
-
Apache Hive
Created on ‎09-30-2021 07:33 AM - edited ‎09-30-2021 07:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I need to load the CSV data into hive table but i am facing issues with embedded double quotes in few column values as well embedded commas in other columns .
Because of this, wherever embedded double quotes and embedded commas are occured , the data from there not loading properly and filled with nulls.
I have tried using below openCSV serde options and all. But no luck!
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "\t",
"quoteChar" = "'",
"escapeChar" = "\\"
)
theCSV data looks like this
id,name,description,role
"1","peter","He is Data enginer"", "Senior Engineer"
"2","Anee","Hadoop Engineer"","Lead"
"3","James","Data, Architect"","Sr Architect"
hive data should look like this after loading
Id | Name | Description | Role |
1 | peter | He is Data Engineer | Senior Engineer |
2 | anee | Hadoop Engineer | Lead |
3 | james | Data, Architect | Sr Architect |
Created on ‎09-30-2021 10:30 AM - edited ‎09-30-2021 10:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am able to acheive,please find my testcase
abc.txt
====
"1","peter","He is Data enginer", "Senior Engineer"
"2","Anee","Hadoop Engineer","Lead"
"3","James","Data, Architect","Sr Architect"
hdfs dfs -put abc.txt /user/hive/
create external table test_csv(num string ,name string ,work string ,designation string )
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties
(
"separatorChar" = ','
,"quoteChar" = '"'
)
STORED AS TEXTFILE;
LOAD DATA INPATH '/user/hive/abc.txt' INTO TABLE test_csv;
select * from test_csv
+---------------+----------------+---------------------+-----------------------+
| test_csv.num | test_csv.name | test_csv.work | test_csv.designation |
+---------------+----------------+---------------------+-----------------------+
| 1 | peter | He is Data enginer | Senior Engineer |
| 2 | Anee | Hadoop Engineer | Lead |
| 3 | James | Data, Architect | Sr Architect |
| 1 | peter | He is Data enginer | Senior Engineer |
| 2 | Anee | Hadoop Engineer | Lead |
| 3 | James | Data, Architect | Sr Architect |
+---------------+----------------+---------------------+-----------------------+
Please accept it as solution,if your queries are answered and the testcase works in your scenario.
Created on ‎09-30-2021 10:30 AM - edited ‎09-30-2021 10:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am able to acheive,please find my testcase
abc.txt
====
"1","peter","He is Data enginer", "Senior Engineer"
"2","Anee","Hadoop Engineer","Lead"
"3","James","Data, Architect","Sr Architect"
hdfs dfs -put abc.txt /user/hive/
create external table test_csv(num string ,name string ,work string ,designation string )
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties
(
"separatorChar" = ','
,"quoteChar" = '"'
)
STORED AS TEXTFILE;
LOAD DATA INPATH '/user/hive/abc.txt' INTO TABLE test_csv;
select * from test_csv
+---------------+----------------+---------------------+-----------------------+
| test_csv.num | test_csv.name | test_csv.work | test_csv.designation |
+---------------+----------------+---------------------+-----------------------+
| 1 | peter | He is Data enginer | Senior Engineer |
| 2 | Anee | Hadoop Engineer | Lead |
| 3 | James | Data, Architect | Sr Architect |
| 1 | peter | He is Data enginer | Senior Engineer |
| 2 | Anee | Hadoop Engineer | Lead |
| 3 | James | Data, Architect | Sr Architect |
+---------------+----------------+---------------------+-----------------------+
Please accept it as solution,if your queries are answered and the testcase works in your scenario.
Created ‎09-30-2021 10:39 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your response Ashish,
If you see my test data , its having two double quotes (extra double quote ) in column values . thats where its creating problems.
"1","peter","He is Data enginer"", "Senior Engineer"
"2","Anee","Hadoop Engineer"","Lead"
"3","James","Data, Architect"","Sr Architect"
Created ‎09-30-2021 10:48 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes that was indeed a problem 🙂
I was about to comment on that
