Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive- escaping field delimiter in column value

Solved Go to solution
Highlighted

Hive- escaping field delimiter in column value

Contributor

My dataset set is as below:

700,Angus (1995),Comedy

702,Faces (1968),Drama

703,Boys (1996),Drama

704,"Quest, The (1996)",Action|Adventure

705,Cosi (1996),Comedy

747,"Stupids, The (1996)",Comedy

Create table movies(
movieid int,
title String,
genre string)
row format delimited
fields terminated by ','
;

Select title , genre from movies;

Since the rows have comma separated values, the records like 704,"Quest, The (1996)",Action|Adventure returns Quest as title, instead of Quest,The (1996).

And Genre value is shown as The(1996) instead of Action|Adventure.

How to load such data correctly by escaping the delimiter in the value ?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Hive- escaping field delimiter in column value

Hi @Ramya Jayathirtha

Adding to @Sonu Sahi's reply, the CSVSerde is available in Hive 0.14 and greater. The following example creates a TSV (Tab-separated) file.

<code>CREATE TABLE my_table(a string, b string, ...)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'WITH SERDEPROPERTIES ( "separatorChar" = "\t", "quoteChar" = "'", "escapeChar" = "\\" ) STORED AS TEXTFILE; Default properties for SerDe is Comma-Separated (CSV) file
DEFAULT_ESCAPE_CHARACTER \
DEFAULT_QUOTE_CHARACTER "
DEFAULT_SEPARATOR ,

This SerDe works for most CSV data, but does not handle embedded newlines. To use the SerDe, specify the fully qualified class name org.apache.hadoop.hive.serde2.OpenCSVSerde.

If you want to use the TextFile format, then use 'ESCAPED BY' in the DDL.

"Enable escaping for the delimiter characters by using the 'ESCAPED BY' clause (such as ESCAPED BY '\')
Escaping is needed if you want to work with data that can contain these delimiter characters.

A custom NULL format can also be specified using the 'NULL DEFINED AS' clause (default is '\N').
"

6 REPLIES 6

Re: Hive- escaping field delimiter in column value

Guru

Hi @Ramya Jayathirtha

Try the following:

create table movies(

movieid int,

title String,

genre string)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

This should resolve your issue as the OpenCSVSerde's default properties should work for your use case.

Re: Hive- escaping field delimiter in column value

Contributor

Thank You @Sonu Sahi for the solution.

Re: Hive- escaping field delimiter in column value

Hi @Ramya Jayathirtha

Adding to @Sonu Sahi's reply, the CSVSerde is available in Hive 0.14 and greater. The following example creates a TSV (Tab-separated) file.

<code>CREATE TABLE my_table(a string, b string, ...)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'WITH SERDEPROPERTIES ( "separatorChar" = "\t", "quoteChar" = "'", "escapeChar" = "\\" ) STORED AS TEXTFILE; Default properties for SerDe is Comma-Separated (CSV) file
DEFAULT_ESCAPE_CHARACTER \
DEFAULT_QUOTE_CHARACTER "
DEFAULT_SEPARATOR ,

This SerDe works for most CSV data, but does not handle embedded newlines. To use the SerDe, specify the fully qualified class name org.apache.hadoop.hive.serde2.OpenCSVSerde.

If you want to use the TextFile format, then use 'ESCAPED BY' in the DDL.

"Enable escaping for the delimiter characters by using the 'ESCAPED BY' clause (such as ESCAPED BY '\')
Escaping is needed if you want to work with data that can contain these delimiter characters.

A custom NULL format can also be specified using the 'NULL DEFINED AS' clause (default is '\N').
"

Re: Hive- escaping field delimiter in column value

Contributor

Thank you @Sindhu for the elaborate explanation and the solution..!!

Re: Hive- escaping field delimiter in column value

New Contributor

I have the same situation in log file. My data set looks like this

2017-11-29 16:19:39,217 DEBUG Conn -> 8=FIXT.1.1|9=76|35=A

2017-11-29 16:19:43,618 DEBUG Conn -> 8=FIXT.1.1|9=52|35=0|49=1151|56=BSEEQ|34=2

How to handle this Regex serde. my create table query is:

CREATE TABLE capital_log1 ( date STRING, filed2 STRING, filed3 STRING, filed4 STRING,filed7 map<bigint,string>) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^ ]*) ([^,]*)([^ ]*) (.*\->) (.*\|)", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s" ) STORED AS TEXTFILE; It throws me an exception.

Re: Hive- escaping field delimiter in column value

New Contributor

Hello Sir,


How do we store if via ETL layer '\n' is manipulated as '\\n'. org.apache.hadoop.hive.serde2.OpenCSVSerde does not work as data gets misaligned. Is there any way to load it without preprocessing?