Created on 08-12-2020 07:01 AM - last edited on 08-12-2020 09:46 AM by VidyaSargur
We use Hive 3.1.0 and we're not sure if hivexmlserde works with current Hive version
https://mvnrepository.com/artifact/com.ibm.spss.hive.serde2.xml/hivexmlserde/1.0.5.3
Is anybody using hivexmlserde with Hive 3.1.0 successfully? So I will know if it's even possible
Created 08-13-2020 03:35 AM
I'm sorry, XML, not JSON
Created 08-13-2020 09:45 AM
The serde class is no more supported in HDP 3.0. It's replaced with 'AbstractSerDe'.
Created 07-26-2023 06:36 AM
Hi @asish , do you know how to resolve this then?
Created 07-26-2023 06:46 AM
@Ashish03 Cloudera does not ship XML serde, you need to build and use.
Created 07-26-2023 11:03 AM
I would be grateful if you provide link for build or source code and command to build.
I don't know in which file I have to mention 'AbstractSerde' or whatever changes.
Created on 08-21-2020 02:27 PM - edited 08-21-2020 02:27 PM
@BohdanSemchuk You can use the attached modified jar which helps overcome the issue that you have come across. Download the jar and copy it to the hdfs location hdfs://user/hive/ and provide hive:hadoop permission with 777 permission.
Download Link:
https://drive.google.com/file/d/1o-DtrY4bNLaUkDJfOJAjchNtVcL4P64-/view?usp=sharing
Steps to add the jar:
1. Login to the beeline-shell
2. execute the below command,
add jar hdfs://com:8020/user/hive/hivexmlserde-1.0.5.3_updated.jar;
3. Create the table as in below,
Sample table:
==========
CREATE EXTERNAL TABLE `ssp`(
`name` array<string> COMMENT 'from deserializer',
`value` array<string> COMMENT 'from deserializer',
`numvalue` array<double> COMMENT 'from deserializer',
`id` array<int> COMMENT 'from deserializer',
`start_date` array<string> COMMENT 'from deserializer',
`end_date` array<string> COMMENT 'from deserializer')
ROW FORMAT SERDE
'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
'column.xpath.end_date'='/row/end/text()',
'column.xpath.id'='/row/id/text()',
'column.xpath.name'='/row/column/name/text()',
'column.xpath.numvalue'='/row/column/number/text()',
'column.xpath.start_date'='/row/start/text()',
'column.xpath.value'='/row/column/string/text()')
STORED AS INPUTFORMAT
'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://com:8020/user/hive'
TBLPROPERTIES (
'transient_lastDdlTime'='1566907280',
'xmlinput.end'='</row>',
'xmlinput.start'='<row');
4. Running the `show tables` populates this table without any issue.
Try implementing the above recommendation and help us know if you were able to alleviate the issue.
Created 07-26-2023 11:20 AM
Hi There,
I checked with your jar and partially working, but also got below error, better if you can help to provide location / link of source code and piece of code which need to change along with file name.