Support Questions

Find answers, ask questions, and share your expertise

how to export xml data saved in a hive table's string field to other hive table created as SERDE ,how to export xml data saved in a ms-sql field (as single string) into hive table created with SERDE

avatar
New Contributor

1) below hive table "books_xml" contains only one field named "xmldata" in which I have saved xml data as a single string.

hive> select xmldata from books_xml;
OK
<?xml version="1.0" encoding="UTF-8"?><catalog><book>  <id>11</id> 
<genre>Computer</genre>  <price>44</price></book><book>  <id>44</id> 
<genre>Fantasy</genre>  <price>5</price></book></catalog>
Time taken: 0.175 seconds, Fetched: 1 row(s)
2) I want to export above xml data (which is in field) into below hive table created with SERDE properties?

CREATE TABLE books_serde (
   id STRING, 
   genre STRING, 
   price DOUBLE
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.id"="/book/id/text()",
"column.xpath.genre"="/book/genre/text()",
"column.xpath.price"="/book/price/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES (
"xmlinput.start"="<book>",
"xmlinput.end"="</book>"
);
,
1) below hive table "books_xml" contains only one field named "xmldata String" in which I have saved xml data as a single string.
hive> select xmldata from books_xml;
OK
<?xml version="1.0" encoding="UTF-8"?><catalog><book>  <id>11</id> 
<genre>Computer</genre>  <price>44</price></book><book>  <id>44</id> 
<genre>Fantasy</genre>  <price>5</price></book></catalog>
Time taken: 0.175 seconds, Fetched: 1 row(s)
2) I want to export above xml data (which is in field) into below hive table created with SERDE properties?
CREATE TABLE books_serde (
   id STRING, 
   genre STRING, 
   price DOUBLE
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.id"="/book/id/text()",
"column.xpath.genre"="/book/genre/text()",
"column.xpath.price"="/book/price/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES (
"xmlinput.start"="<book>",
"xmlinput.end"="</book>"
);
1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Riaz Lala

Okay, I have got two options for you that you can try:

Option#11) Using xpath builtin function:

You might have to do some data transformations to your input file to break down the book nodes to individual records. My input data looks like below:

<catalog><book><id>11</id><genre>Computer</genre><price>44</price></book></catalog> 

<catalog><book><id>45</id><genre>Fantasy</genre><price>5</price></book></catalog>

hive> select * from books_xml; 

OK 

<catalog><book><id>11</id><genre>Computer</genre><price>44</price></book></catalog> 

<catalog><book><id>45</id><genre>Fantasy</genre><price>5</price></book></catalog>
Time taken: 0.483 seconds, Fetched: 2 row(s)

Then you can use xpath function like below to access individual columns:

hive> select xpath_int(xmldata, '/catalog/book/id'), xpath_string(xmldata, '/catalog/book/genre') as genre, xpath_int(xmldata, '/catalog/book/price') from books_xml;
OK
11	Computer	44
45	Fantasy		5
Time taken: 0.508 seconds, Fetched: 2 row(s)

You can do insert overwrite to load into any other external table you want with regular column names

Option#2) Use of external UDF function that parses the xml data into an array and use explode function to select the indivudual values and load them to any target table you have.

Let me know if this helps

View solution in original post

6 REPLIES 6

avatar
Expert Contributor

@Riaz Lala

Have you looked at the post

Let me know if this is helpful

avatar
New Contributor

Thanks Hari, yes I have seen that post which is bit different in scenario. Post explains loading xml file "sample.xml" into hive serde table. In my case I have xml stored in a hive table field 'xmldata' and i want to use some sort of 'insert into table_serde select xmlfield from XmlTable'. I mean xml data moving from hive table to hive table.

Other way for me is that I can add location parameter in serde table pointing to location where single field hive table has stored its data but this has draw back if I add other fields in single field hive table it will invalidate serde table.

Thanks,

Riaz Lala

avatar
Expert Contributor

@Riaz Lala

Okay, I have got two options for you that you can try:

Option#11) Using xpath builtin function:

You might have to do some data transformations to your input file to break down the book nodes to individual records. My input data looks like below:

<catalog><book><id>11</id><genre>Computer</genre><price>44</price></book></catalog> 

<catalog><book><id>45</id><genre>Fantasy</genre><price>5</price></book></catalog>

hive> select * from books_xml; 

OK 

<catalog><book><id>11</id><genre>Computer</genre><price>44</price></book></catalog> 

<catalog><book><id>45</id><genre>Fantasy</genre><price>5</price></book></catalog>
Time taken: 0.483 seconds, Fetched: 2 row(s)

Then you can use xpath function like below to access individual columns:

hive> select xpath_int(xmldata, '/catalog/book/id'), xpath_string(xmldata, '/catalog/book/genre') as genre, xpath_int(xmldata, '/catalog/book/price') from books_xml;
OK
11	Computer	44
45	Fantasy		5
Time taken: 0.508 seconds, Fetched: 2 row(s)

You can do insert overwrite to load into any other external table you want with regular column names

Option#2) Use of external UDF function that parses the xml data into an array and use explode function to select the indivudual values and load them to any target table you have.

Let me know if this helps

avatar
New Contributor

Thank you very much Hari. Option#1 is perfect in my case.

avatar
Expert Contributor

@Riaz Lala

Glad that it was helpful. Can you upvote and accept the answer please?

Thanks

avatar
New Contributor

@Riaz Lala

Have you find solution to your post? If yes, Can you please pose your approach please. I am struggling with the same situation. Help needed. Thanks!