Support Questions

Find answers, ask questions, and share your expertise

Facing issue while using serde to load Hive tables - SeperatorChar

New Contributor

I am facing problem while loading the data in Hive table using serde, details below:

Sample Record in File:

"age|""job""|""marital""|""education""|""default""

"58|""management""|""married""|""tertiary""|""no""

Code used:

CREATE EXTERNAL TABLE IF NOT EXISTS test

(

age INT,

job STRING,

marital STRING,

education STRING,

credit_default STRING

) COMMENT 'CAMPAIGN'

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

WITH SERDEPROPERTIES ( "separatorChar" = "|", "quoteChar" = "\"")

LOCATION '/user/priyanka' tblproperties("skip.header.line.count"="1");

Output from above code

58|"management"|"married"|"tertiary"|"no" NULL NULL NULL NULL

Issue: Its not reading the Pipe Delimiter and loading the whole record in the first field.

However, the quoteChar has worked correctly by removing the extra quotes.

6 REPLIES 6

Hey @Priyanka

You can try to use RegexSerde instead of OpenCSVSerde.
Here's an example

CREATE EXTERNAL TABLE test 
(
age INT,
job STRING,
marital STRING,
education STRING,
credit_default STRING
) COMMENT 'CAMPAIGN'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES ('input.regex' = '^\"(\\d+)\\W+(\\w+)\\W+(\\w+)\\W+(\\w+)\\W+(\\w+).*')
LOCATION '/user/hive/warehouse/test/';

Hope this helps! 🙂

Hi @Priyanka. There are a few issues with your file that are causing CSVSerde to not work correctly.

The first issue is that the delimiter for age is in the wrong position. It needs to be moved one position to the right. That will solve the problem where the entire row was being loaded into the first field. Here is the changed file (with the first delimiter moved):

"age"|"job""|""marital""|""education""|""default""
"58"|"management""|""married""|""tertiary""|""no""

Now the second issue is that starting with the "job" column there are extra quotes - the delimiters are quoted as well as the data. This will cause quotes to appear in your output. The delimiters should not be quoted.

If you change your file is changed to look like this, it will no longer load any quotes into the output:

"age"|"job"|"marital"|"education"|"default"
"58"|"management"|"married"|"tertiary"|"no"

I hope this helps.

New Contributor

Thank you this helps!

Hey @Priyanka

You can try to use RegexSerde instead of OpenCSVSerde.
Here's an example

CREATE EXTERNAL TABLE test 
(
age INT,
job STRING,
marital STRING,
education STRING,
credit_default STRING
) COMMENT 'CAMPAIGN'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES ('input.regex' = '^\"(\\d+)\\W+(\\w+)\\W+(\\w+)\\W+(\\w+)\\W+(\\w+).*')
LOCATION '/user/hive/warehouse/test/';

Hope this helps! 🙂

New Contributor

I will try this approach, thank you 🙂

New Contributor

Change the extension of the file from text to.The code then save it on https://paperleaf.ca/coursework/, hope so this will work and solve the error that you are facing with education file code.