Support Questions

Find answers, ask questions, and share your expertise

comma in between data of csv mapped to external table in hive

avatar
Expert Contributor

Hi,

I am getting a huge csv ingested in to nifi to process to a location.

The location is an external table location, from there data is processed in to orc tables.

I am getting comma(,) in between data of csv, can you please help me to handle it.

ex:

file: (here below are 5 fields "brown,fox jumps" and "the, lazy" are single fields)

a, quick, brown,fox jumps, over, the, lazy


When an external table is placed on the file only "a, quick, brown, fox, jumps" are shown

Thank you

1 ACCEPTED SOLUTION

avatar
Master Guru
@Mark

Use csv serde to escape quote characters in csv file,

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = ",",
   "quoteChar"     = "\"")

Example:-

input data:-

a, quick,"brown,fox jumps",over,"the, lazy"

Crete table statement:-

create table hcc(field1 string,
field2 string,
field3 string,
field4 string,
field5 string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = ",",
   "quoteChar"     = "\"");

Select the data in table

hive> select * from hcc;
+---------+---------+------------------+---------+------------+--+
| field1  | field2  |      field3      | field4  |   field5   |
+---------+---------+------------------+---------+------------+--+
| a       |  quick  | brown,fox jumps  | over    | the, lazy  |
+---------+---------+------------------+---------+------------+--+
1 row selected (0.055 seconds)

So in our create table statement we have mentioned quote character as " and seperator as ,. When we query table hive considers all the data enclosing quotes as one filed.

View solution in original post

4 REPLIES 4

avatar
Master Guru
@Mark

Use csv serde to escape quote characters in csv file,

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = ",",
   "quoteChar"     = "\"")

Example:-

input data:-

a, quick,"brown,fox jumps",over,"the, lazy"

Crete table statement:-

create table hcc(field1 string,
field2 string,
field3 string,
field4 string,
field5 string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = ",",
   "quoteChar"     = "\"");

Select the data in table

hive> select * from hcc;
+---------+---------+------------------+---------+------------+--+
| field1  | field2  |      field3      | field4  |   field5   |
+---------+---------+------------------+---------+------------+--+
| a       |  quick  | brown,fox jumps  | over    | the, lazy  |
+---------+---------+------------------+---------+------------+--+
1 row selected (0.055 seconds)

So in our create table statement we have mentioned quote character as " and seperator as ,. When we query table hive considers all the data enclosing quotes as one filed.

avatar
Expert Contributor

@Shu

There is no quote characters in csv.

input is:

a, quick, brown,fox jumps, over, the, lazy

not

a, quick,"brown,fox jumps",over,"the, lazy"

avatar
Master Guru

@Mark

You need to use Regex serde while creating hive table and matching regex to capture the fields that you need to have them in same group.

Some references how to create regex serde tables

https://community.hortonworks.com/articles/58591/using-regular-expressions-to-extract-fields-for-hi....

https://stackoverflow.com/questions/31008371/hive-using-regexserde-to-define-input-format

https://stackoverflow.com/questions/9102184/regex-for-access-log-in-hive-serde

avatar
Expert Contributor
@Shu

Thanks, that works.