Created 08-24-2017 09:42 PM
My dataset set is as below:
700,Angus (1995),Comedy
702,Faces (1968),Drama
703,Boys (1996),Drama
704,"Quest, The (1996)",Action|Adventure
705,Cosi (1996),Comedy
747,"Stupids, The (1996)",Comedy
Create table movies( movieid int, title String, genre string) row format delimited fields terminated by ',' ; Select title , genre from movies;
Since the rows have comma separated values, the records like 704,"Quest, The (1996)",Action|Adventure returns Quest as title, instead of Quest,The (1996).
And Genre value is shown as The(1996) instead of Action|Adventure.
How to load such data correctly by escaping the delimiter in the value ?
Created 08-25-2017 07:33 AM
Adding to @Sonu Sahi's reply, the CSVSerde is available in Hive 0.14 and greater. The following example creates a TSV (Tab-separated) file.
<code>CREATE
TABLE
my_table(a string, b string, ...)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH
SERDEPROPERTIES (
"separatorChar"
=
"\t"
,
"quoteChar"
=
"'"
,
"escapeChar"
=
"\\"
)
STORED
AS
TEXTFILE;
Default properties for SerDe is Comma-Separated (CSV) fileDEFAULT_ESCAPE_CHARACTER \
DEFAULT_QUOTE_CHARACTER "
DEFAULT_SEPARATOR ,
This SerDe works for most CSV data, but does not handle embedded newlines. To use the SerDe, specify the fully qualified class name org.apache.hadoop.hive.serde2.OpenCSVSerde.
If you want to use the TextFile format, then use 'ESCAPED BY' in the DDL.
"Enable escaping for the delimiter characters by using the 'ESCAPED BY' clause (such as ESCAPED BY '\')
Escaping is needed if you want to work with data that can contain these delimiter characters.
A custom NULL format can also be specified using the 'NULL DEFINED AS' clause (default is '\N')."
Created 04-19-2020 04:44 AM
Hi Manoj,
We are using textfiles with separationchar as '|', but the problem we have embedded new lines in columns which is resulting data empty in hive as its considering as new line.
Rest data is migrating perfectly fine.
Could you please suggest us how to avoid new line characters in between column-data.
Thanks&Regards
Sreeja