Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to handle nested XML file in hive and How to remove

How to handle nested XML file in hive and How to remove

New Contributor

when i run query select * from group_xml i am getting output like <string>AB</string> <string>ACESS1ACESS2</string> <string>KISHOREKUMARVENKATBALAJI</string> . how to remove <string></string> and i want output like

A ACESS1 KISHORE

A ACESS1 KUMAR

B ACESS2 VENKAT

B ACESS2 BALAJI

Please see my below code.

CREATE TABLE group_xml(GROUPNAME string,GROUPDESCRIPTION string,USERNAME string ) ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' WITH SERDEPROPERTIES ( "column.xpath.GROUPNAME"="/MEDMLDATA/RIGHTSGROUP/@GROUPNAME", "column.xpath.GROUPDESCRIPTION"="/MEDMLDATA/RIGHTSGROUP/@GROUPDESCRIPTION", "column.xpath.USERNAME"="/MEDMLDATA/RIGHTSGROUP/USERREF/@USERNAME" ) STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' TBLPROPERTIES ("xmlinput.start"="<MEDMLDATA>","xmlinput.end"="</MEDMLDATA>");

XML FILE

<?xml version="1.0" encoding="UTF-8"?> <MEDMLDATA> <RIGHTSGROUP GROUPNAME="A" GROUPDESCRIPTION="ACESS1"> <USERREF USERNAME="KISHORE" /> <USERREF USERNAME="KUMAR" /> <RIGHTREF RIGHT="CREATE" /> <RIGHTREF RIGHT="MODIFY" /> </RIGHTSGROUP> <RIGHTSGROUP GROUPNAME="B" GROUPDESCRIPTION="ACESS2"> <USERREF USERNAME="VENKAT" /> <USERREF USERNAME="BALAJI" /> <RIGHTREF RIGHT="DELETE" /> <RIGHTREF RIGHT="ADD" /> </RIGHTSGROUP> <SITEGROUP SITENAME="Test1"> <USERREF USERNAME="KK01" /> <USERREF USERNAME="KK02" /> <USERREF USERNAME="KK03" /> </SITEGROUP> </MEDMLDATA>


1 REPLY 1
Highlighted

Re: How to handle nested XML file in hive and How to remove

With the help of xpath functions in hive.You can process the xml files

Don't have an account?
Coming from Hortonworks? Activate your account here