Created 10-22-2016 06:32 PM
1) below hive table "books_xml" contains only one field named "xmldata" in which I have saved xml data as a single string.
hive> select xmldata from books_xml;
OK
<?xml version="1.0" encoding="UTF-8"?><catalog><book> <id>11</id>
<genre>Computer</genre> <price>44</price></book><book> <id>44</id>
<genre>Fantasy</genre> <price>5</price></book></catalog>
Time taken: 0.175 seconds, Fetched: 1 row(s)
2) I want to export above xml data (which is in field) into below hive table created with SERDE properties?
CREATE TABLE books_serde (
id STRING,
genre STRING,
price DOUBLE
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.id"="/book/id/text()",
"column.xpath.genre"="/book/genre/text()",
"column.xpath.price"="/book/price/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES (
"xmlinput.start"="<book>",
"xmlinput.end"="</book>"
);
,1) below hive table "books_xml" contains only one field named "xmldata String" in which I have saved xml data as a single string.
hive> select xmldata from books_xml;
OK
<?xml version="1.0" encoding="UTF-8"?><catalog><book> <id>11</id>
<genre>Computer</genre> <price>44</price></book><book> <id>44</id>
<genre>Fantasy</genre> <price>5</price></book></catalog>
Time taken: 0.175 seconds, Fetched: 1 row(s)
2) I want to export above xml data (which is in field) into below hive table created with SERDE properties?
CREATE TABLE books_serde (
id STRING,
genre STRING,
price DOUBLE
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.id"="/book/id/text()",
"column.xpath.genre"="/book/genre/text()",
"column.xpath.price"="/book/price/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES (
"xmlinput.start"="<book>",
"xmlinput.end"="</book>"
);
Created 10-22-2016 09:34 PM
Okay, I have got two options for you that you can try:
Option#11) Using xpath builtin function:
You might have to do some data transformations to your input file to break down the book nodes to individual records. My input data looks like below:
<catalog><book><id>11</id><genre>Computer</genre><price>44</price></book></catalog> <catalog><book><id>45</id><genre>Fantasy</genre><price>5</price></book></catalog>
hive> select * from books_xml; OK <catalog><book><id>11</id><genre>Computer</genre><price>44</price></book></catalog> <catalog><book><id>45</id><genre>Fantasy</genre><price>5</price></book></catalog> Time taken: 0.483 seconds, Fetched: 2 row(s)
Then you can use xpath function like below to access individual columns:
hive> select xpath_int(xmldata, '/catalog/book/id'), xpath_string(xmldata, '/catalog/book/genre') as genre, xpath_int(xmldata, '/catalog/book/price') from books_xml; OK 11 Computer 44 45 Fantasy 5 Time taken: 0.508 seconds, Fetched: 2 row(s)
You can do insert overwrite to load into any other external table you want with regular column names
Option#2) Use of external UDF function that parses the xml data into an array and use explode function to select the indivudual values and load them to any target table you have.
Let me know if this helps
Created 10-22-2016 08:06 PM
Created 10-22-2016 08:54 PM
Thanks Hari, yes I have seen that post which is bit different in scenario. Post explains loading xml file "sample.xml" into hive serde table. In my case I have xml stored in a hive table field 'xmldata' and i want to use some sort of 'insert into table_serde select xmlfield from XmlTable'. I mean xml data moving from hive table to hive table.
Other way for me is that I can add location parameter in serde table pointing to location where single field hive table has stored its data but this has draw back if I add other fields in single field hive table it will invalidate serde table.
Thanks,
Riaz Lala
Created 10-22-2016 09:34 PM
Okay, I have got two options for you that you can try:
Option#11) Using xpath builtin function:
You might have to do some data transformations to your input file to break down the book nodes to individual records. My input data looks like below:
<catalog><book><id>11</id><genre>Computer</genre><price>44</price></book></catalog> <catalog><book><id>45</id><genre>Fantasy</genre><price>5</price></book></catalog>
hive> select * from books_xml; OK <catalog><book><id>11</id><genre>Computer</genre><price>44</price></book></catalog> <catalog><book><id>45</id><genre>Fantasy</genre><price>5</price></book></catalog> Time taken: 0.483 seconds, Fetched: 2 row(s)
Then you can use xpath function like below to access individual columns:
hive> select xpath_int(xmldata, '/catalog/book/id'), xpath_string(xmldata, '/catalog/book/genre') as genre, xpath_int(xmldata, '/catalog/book/price') from books_xml; OK 11 Computer 44 45 Fantasy 5 Time taken: 0.508 seconds, Fetched: 2 row(s)
You can do insert overwrite to load into any other external table you want with regular column names
Option#2) Use of external UDF function that parses the xml data into an array and use explode function to select the indivudual values and load them to any target table you have.
Let me know if this helps
Created 10-23-2016 01:37 AM
Thank you very much Hari. Option#1 is perfect in my case.
Created 10-23-2016 01:42 AM
Created 11-29-2018 02:06 AM
@Riaz Lala
Have you find solution to your post? If yes, Can you please pose your approach please. I am struggling with the same situation. Help needed. Thanks!