Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Does anyone has Hive 3.1.0 and he is able to parse JSON with hivexmlserde1.0.5.3?

avatar

We use Hive 3.1.0 and we're not sure if hivexmlserde works with current Hive version
https://mvnrepository.com/artifact/com.ibm.spss.hive.serde2.xml/hivexmlserde/1.0.5.3
Is anybody using hivexmlserde with Hive 3.1.0 successfully? So I will know if it's even possible

7 REPLIES 7

avatar

I'm sorry, XML, not JSON

avatar
Guru

The serde class is no more supported in HDP 3.0. It's replaced with 'AbstractSerDe'.

avatar
Explorer

Hi @asish , do you know how to resolve this then?

avatar
Guru

@Ashish03  Cloudera does not ship XML serde, you need to build and use.

avatar
New Contributor

I would be grateful if you provide link for build or source code and command to build.
I don't know in which file I have to mention 'AbstractSerde' or whatever changes.

avatar
Cloudera Employee

@BohdanSemchuk  You can use the attached modified jar which helps overcome the issue that you have come across. Download the jar and copy it to the hdfs location hdfs://user/hive/ and provide hive:hadoop permission with 777 permission.

 

Download Link:

https://drive.google.com/file/d/1o-DtrY4bNLaUkDJfOJAjchNtVcL4P64-/view?usp=sharing

 

Steps to add the jar:

1. Login to the beeline-shell

2. execute the below command, 

add jar hdfs://com:8020/user/hive/hivexmlserde-1.0.5.3_updated.jar;

3. Create the table as in below,

Sample table:

==========

CREATE EXTERNAL TABLE `ssp`(
`name` array<string> COMMENT 'from deserializer',
`value` array<string> COMMENT 'from deserializer',
`numvalue` array<double> COMMENT 'from deserializer',
`id` array<int> COMMENT 'from deserializer',
`start_date` array<string> COMMENT 'from deserializer',
`end_date` array<string> COMMENT 'from deserializer')
ROW FORMAT SERDE
'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
'column.xpath.end_date'='/row/end/text()',
'column.xpath.id'='/row/id/text()',
'column.xpath.name'='/row/column/name/text()',
'column.xpath.numvalue'='/row/column/number/text()',
'column.xpath.start_date'='/row/start/text()',
'column.xpath.value'='/row/column/string/text()')
STORED AS INPUTFORMAT
'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://com:8020/user/hive'
TBLPROPERTIES (
'transient_lastDdlTime'='1566907280',
'xmlinput.end'='</row>',
'xmlinput.start'='<row');

 

4. Running the `show tables` populates this table without any issue.

 

Try implementing the above recommendation and help us know if you were able to alleviate the issue.

avatar
New Contributor

Hi There,

I checked with your jar and partially working, but also got below error, better if you can help to provide location / link of source code and piece of code which need to change along with file name.

Spoiler
Error: java.lang.NoClassDefFoundError: com/ibm/spss/hive/serde2/xml/XmlInputFormat$XmlRecordReader