- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
comma in between data of csv mapped to external table in hive
- Labels:
-
Apache Hive
-
Apache Spark
Created ‎02-18-2018 02:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am getting a huge csv ingested in to nifi to process to a location.
The location is an external table location, from there data is processed in to orc tables.
I am getting comma(,) in between data of csv, can you please help me to handle it.
ex:
file: (here below are 5 fields "brown,fox jumps" and "the, lazy" are single fields)
a, quick, brown,fox jumps, over, the, lazy
When an external table is placed on the file only "a, quick, brown, fox, jumps" are shown
Thank you
Created ‎02-18-2018 02:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Use csv serde to escape quote characters in csv file,
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "\"")
Example:-
input data:-
a, quick,"brown,fox jumps",over,"the, lazy"
Crete table statement:-
create table hcc(field1 string, field2 string, field3 string, field4 string, field5 string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "\"");
Select the data in table
hive> select * from hcc; +---------+---------+------------------+---------+------------+--+ | field1 | field2 | field3 | field4 | field5 | +---------+---------+------------------+---------+------------+--+ | a | quick | brown,fox jumps | over | the, lazy | +---------+---------+------------------+---------+------------+--+ 1 row selected (0.055 seconds)
So in our create table statement we have mentioned quote character as " and seperator as ,. When we query table hive considers all the data enclosing quotes as one filed.
Created ‎02-18-2018 02:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Use csv serde to escape quote characters in csv file,
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "\"")
Example:-
input data:-
a, quick,"brown,fox jumps",over,"the, lazy"
Crete table statement:-
create table hcc(field1 string, field2 string, field3 string, field4 string, field5 string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "\"");
Select the data in table
hive> select * from hcc; +---------+---------+------------------+---------+------------+--+ | field1 | field2 | field3 | field4 | field5 | +---------+---------+------------------+---------+------------+--+ | a | quick | brown,fox jumps | over | the, lazy | +---------+---------+------------------+---------+------------+--+ 1 row selected (0.055 seconds)
So in our create table statement we have mentioned quote character as " and seperator as ,. When we query table hive considers all the data enclosing quotes as one filed.
Created ‎02-19-2018 09:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is no quote characters in csv.
input is:
a, quick, brown,fox jumps, over, the, lazy
not
a, quick,"brown,fox jumps",over,"the, lazy"
Created ‎02-21-2018 12:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You need to use Regex serde while creating hive table and matching regex to capture the fields that you need to have them in same group.
Some references how to create regex serde tables
https://stackoverflow.com/questions/31008371/hive-using-regexserde-to-define-input-format
https://stackoverflow.com/questions/9102184/regex-for-access-log-in-hive-serde
Created ‎03-14-2018 02:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
