Created 02-18-2018 02:11 PM
Hi,
I am getting a huge csv ingested in to nifi to process to a location.
The location is an external table location, from there data is processed in to orc tables.
I am getting comma(,) in between data of csv, can you please help me to handle it.
ex:
file: (here below are 5 fields "brown,fox jumps" and "the, lazy" are single fields)
a, quick, brown,fox jumps, over, the, lazy
When an external table is placed on the file only "a, quick, brown, fox, jumps" are shown
Thank you
Created 02-18-2018 02:50 PM
Use csv serde to escape quote characters in csv file,
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "\"")
Example:-
input data:-
a, quick,"brown,fox jumps",over,"the, lazy"
Crete table statement:-
create table hcc(field1 string, field2 string, field3 string, field4 string, field5 string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "\"");
Select the data in table
hive> select * from hcc; +---------+---------+------------------+---------+------------+--+ | field1 | field2 | field3 | field4 | field5 | +---------+---------+------------------+---------+------------+--+ | a | quick | brown,fox jumps | over | the, lazy | +---------+---------+------------------+---------+------------+--+ 1 row selected (0.055 seconds)
So in our create table statement we have mentioned quote character as " and seperator as ,. When we query table hive considers all the data enclosing quotes as one filed.
Created 02-18-2018 02:50 PM
Use csv serde to escape quote characters in csv file,
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "\"")
Example:-
input data:-
a, quick,"brown,fox jumps",over,"the, lazy"
Crete table statement:-
create table hcc(field1 string, field2 string, field3 string, field4 string, field5 string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "\"");
Select the data in table
hive> select * from hcc; +---------+---------+------------------+---------+------------+--+ | field1 | field2 | field3 | field4 | field5 | +---------+---------+------------------+---------+------------+--+ | a | quick | brown,fox jumps | over | the, lazy | +---------+---------+------------------+---------+------------+--+ 1 row selected (0.055 seconds)
So in our create table statement we have mentioned quote character as " and seperator as ,. When we query table hive considers all the data enclosing quotes as one filed.
Created 02-19-2018 09:28 PM
There is no quote characters in csv.
input is:
a, quick, brown,fox jumps, over, the, lazy
not
a, quick,"brown,fox jumps",over,"the, lazy"
Created 02-21-2018 12:04 AM
You need to use Regex serde while creating hive table and matching regex to capture the fields that you need to have them in same group.
Some references how to create regex serde tables
https://stackoverflow.com/questions/31008371/hive-using-regexserde-to-define-input-format
https://stackoverflow.com/questions/9102184/regex-for-access-log-in-hive-serde
Created 03-14-2018 02:43 PM