Created 06-02-2018 05:52 PM
I am facing problem while loading the data in Hive table using serde, details below:
Sample Record in File:
"age|""job""|""marital""|""education""|""default""
"58|""management""|""married""|""tertiary""|""no""
Code used:
CREATE EXTERNAL TABLE IF NOT EXISTS test
(
age INT,
job STRING,
marital STRING,
education STRING,
credit_default STRING
) COMMENT 'CAMPAIGN'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ( "separatorChar" = "|", "quoteChar" = "\"")
LOCATION '/user/priyanka' tblproperties("skip.header.line.count"="1");
Output from above code
58|"management"|"married"|"tertiary"|"no" NULL NULL NULL NULL
Issue: Its not reading the Pipe Delimiter and loading the whole record in the first field.
However, the quoteChar has worked correctly by removing the extra quotes.
Created 06-05-2018 12:05 AM
Hey @Priyanka
You can try to use RegexSerde instead of OpenCSVSerde.
Here's an example
CREATE EXTERNAL TABLE test ( age INT, job STRING, marital STRING, education STRING, credit_default STRING ) COMMENT 'CAMPAIGN' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ('input.regex' = '^\"(\\d+)\\W+(\\w+)\\W+(\\w+)\\W+(\\w+)\\W+(\\w+).*') LOCATION '/user/hive/warehouse/test/';
Hope this helps! 🙂
Created 06-06-2018 10:48 AM
Hi @Priyanka. There are a few issues with your file that are causing CSVSerde to not work correctly.
The first issue is that the delimiter for age is in the wrong position. It needs to be moved one position to the right. That will solve the problem where the entire row was being loaded into the first field. Here is the changed file (with the first delimiter moved):
"age"|"job""|""marital""|""education""|""default"" "58"|"management""|""married""|""tertiary""|""no""
Now the second issue is that starting with the "job" column there are extra quotes - the delimiters are quoted as well as the data. This will cause quotes to appear in your output. The delimiters should not be quoted.
If you change your file is changed to look like this, it will no longer load any quotes into the output:
"age"|"job"|"marital"|"education"|"default" "58"|"management"|"married"|"tertiary"|"no"
I hope this helps.
Created 06-06-2018 05:02 PM
Thank you this helps!
Created 06-06-2018 01:00 PM
Hey @Priyanka
You can try to use RegexSerde instead of OpenCSVSerde.
Here's an example
CREATE EXTERNAL TABLE test ( age INT, job STRING, marital STRING, education STRING, credit_default STRING ) COMMENT 'CAMPAIGN' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ('input.regex' = '^\"(\\d+)\\W+(\\w+)\\W+(\\w+)\\W+(\\w+)\\W+(\\w+).*') LOCATION '/user/hive/warehouse/test/';
Hope this helps! 🙂
Created 06-06-2018 05:03 PM
I will try this approach, thank you 🙂
Created 03-28-2019 04:23 PM
Change the extension of the file from text to.The code then save it on https://paperleaf.ca/coursework/, hope so this will work and solve the error that you are facing with education file code.