Support Questions
Find answers, ask questions, and share your expertise

Does anyone has Hive 3.1.0 and he is able to parse JSON with hivexmlserde1.0.5.3?

New Contributor

We use Hive 3.1.0 and we're not sure if hivexmlserde works with current Hive version
https://mvnrepository.com/artifact/com.ibm.spss.hive.serde2.xml/hivexmlserde/1.0.5.3
Is anybody using hivexmlserde with Hive 3.1.0 successfully? So I will know if it's even possible

3 REPLIES 3

New Contributor

I'm sorry, XML, not JSON

Expert Contributor

The serde class is no more supported in HDP 3.0. It's replaced with 'AbstractSerDe'.

Cloudera Employee

@BohdanSemchuk  You can use the attached modified jar which helps overcome the issue that you have come across. Download the jar and copy it to the hdfs location hdfs://user/hive/ and provide hive:hadoop permission with 777 permission.

 

Download Link:

https://drive.google.com/file/d/1o-DtrY4bNLaUkDJfOJAjchNtVcL4P64-/view?usp=sharing

 

Steps to add the jar:

1. Login to the beeline-shell

2. execute the below command, 

add jar hdfs://com:8020/user/hive/hivexmlserde-1.0.5.3_updated.jar;

3. Create the table as in below,

Sample table:

==========

CREATE EXTERNAL TABLE `ssp`(
`name` array<string> COMMENT 'from deserializer',
`value` array<string> COMMENT 'from deserializer',
`numvalue` array<double> COMMENT 'from deserializer',
`id` array<int> COMMENT 'from deserializer',
`start_date` array<string> COMMENT 'from deserializer',
`end_date` array<string> COMMENT 'from deserializer')
ROW FORMAT SERDE
'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
'column.xpath.end_date'='/row/end/text()',
'column.xpath.id'='/row/id/text()',
'column.xpath.name'='/row/column/name/text()',
'column.xpath.numvalue'='/row/column/number/text()',
'column.xpath.start_date'='/row/start/text()',
'column.xpath.value'='/row/column/string/text()')
STORED AS INPUTFORMAT
'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://com:8020/user/hive'
TBLPROPERTIES (
'transient_lastDdlTime'='1566907280',
'xmlinput.end'='</row>',
'xmlinput.start'='<row');

 

4. Running the `show tables` populates this table without any issue.

 

Try implementing the above recommendation and help us know if you were able to alleviate the issue.