Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Empty lines in JSON files causing duplicate records. ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'

Empty lines in JSON files causing duplicate records. ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'

Im trying to create a exteral hive database with JsonSerde as follows :

CREATE EXTERNAL TABLE test1 ( user string ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION '/user/ec2-user/test_data/';

Problem is I have a input file with empty blank lines between data.

{"user":"chill1"}

{"user":"chill2"}

{"user":"chill3"}

{"user":"chill4"}

{"user":"chill5"}

If I do count on hive table, I get 9 lines. If I remove lines from source text file then I get the total count as 5. Anybody can help on this one please.

5 REPLIES 5
Highlighted

Re: Empty lines in JSON files causing duplicate records. ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'

Issue resolved.

Highlighted

Re: Empty lines in JSON files causing duplicate records. ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'

Mentor

@Cruz DSouza can you provide your solution so that it's a complete answer?

Highlighted

Re: Empty lines in JSON files causing duplicate records. ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'

Sure. Instead of using

ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"

I used

ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe

Highlighted

Re: Empty lines in JSON files causing duplicate records. ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'

New Contributor

Could you please explain this as I am getting null values when using the serde org.openx.data.jsonserde.JsonSerDe , should there be any external serde properites set

Highlighted

Re: Empty lines in JSON files causing duplicate records. ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'

New Contributor

I am getting null as the output for all the columns when I am using this org.openx.data.jsonserde.JsonSerDe could you please eloborate if any other properties has to be mentioned

Don't have an account?
Coming from Hortonworks? Activate your account here