Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to create hive ddl on top xml using the xml serde

avatar
Contributor

Hi I have an xml with below format,Can anyone help me how to create a hive DDL on top of this xml.

<code><root><root1><id>4545482361</id>`enter code here`
<joiningdate>1/3/2010</joiningdate><Segments><Segmentxse:type="manager"><cityworked>Hyd</cityworked><reports>john</reports><salary>150000</salary><datestarted>1/3/2012</datestarted></Segment><Segmentxse:type="manager"><cityworked>Hyd</cityworked><reports>mike</reports><salary>225000</salary><datestarted>1/9/2014</datestarted></Segment><Segmentxse:type="VP"><cityworked>mumbai</cityworked><datestarted>1/9/2014</datestarted><subemployees><Fname>ram</Fname><Lname>Achanta</Lname><Desgination>Director of IT</Desgination></subemployees></Segment><Segmentxse:type="SVP"><Staus>currentposition</status><numberofemployees>10</numberofemployees></Segment></Segments></root1></root>
2 REPLIES 2

avatar
Expert Contributor
@srini

You can import xml directly using com.ibm.spss.hive.serde2.xml.XmlSerDe which is detailed here:

https://community.hortonworks.com/articles/972/hive-and-xml-pasring.html

Other option is to load entire record in a string and then access it using xpath UDF :

1. create table employee( employee_info string) ;

2. load data local inpath '/home/hduser/sample.xml' into table employee;

3. create view employee_xml_view as SELECT xpath_int(employee_info ,'code/root/root1/id'),xpath_string(employee_info ,'code/root/root1/joiningdate'),xpath_string(employee_info ,'code/root/root1/joiningdate').................. from employee;

4. Select * from employee_xml_view;

avatar
Contributor

Hi Trivedi ,Thanks for your reply.Can you please answer this in details because i was able to create ddl on other part of xml which consists of array and structs,but if you see the xml i attached segments is a tag which contains segment within it and xse:type is manager,vp,svp. Can you please help in creating the exact schema for this