Created 06-21-2016 03:35 PM
Hi All,
Tried out a sample xml parsing using the Serde. but it returns a null value.
hive> DROP TABLE BOOKDATA; OK Time taken: 0.486 seconds hive> > CREATE EXTERNAL TABLE BOOKDATA( > TITLE VARCHAR(40), > PRICE INT > )ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' > WITH SERDEPROPERTIES ( > "column.xpath.TITLE"="/CATALOG/BOOK/TITLE/", > "column.xpath.PRICE"="/CATALOG/BOOK/PRICE/") > STORED AS > INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' > LOCATION '/sourcedata' > TBLPROPERTIES ( > "xmlinput.start"="<CATALOG", > "xmlinput.end"= "</CATALOG>" > ); OK Time taken: 0.285 seconds
hive> select * from BOOKDATA; OK NULL NULL Time taken: 0.184 seconds, Fetched: 1 row(s) hive>
~]$ hadoop fs -cat /sourcedata/bookdata.xml <CATALOG> <BOOK> <TITLE>Hadoop Defnitive Guide</TITLE> <AUTHOR>Tom White</AUTHOR> <COUNTRY>US</COUNTRY> <COMPANY>CLOUDERA</COMPANY> <PRICE>24.90</PRICE> <YEAR>2012</YEAR> </BOOK> <BOOK> <TITLE>Programming Pig</TITLE> <AUTHOR>Alan Gates</AUTHOR> <COUNTRY>USA</COUNTRY> <COMPANY>Horton Works</COMPANY> <PRICE>30.90</PRICE> <YEAR>2013</YEAR> </BOOK> </CATALOG>
Created 06-26-2016 04:31 AM
Hi @elan chelian. It works with following changes (details here😞
Declarations:
DROP TABLE IF EXISTS BOOKDATA; CREATE EXTERNAL TABLE BOOKDATA (TITLE STRING, PRICE FLOAT) ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' WITH SERDEPROPERTIES ( "column.xpath.TITLE"="/BOOK/TITLE/text()", "column.xpath.PRICE"="/BOOK/PRICE/text()") STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION '/user/it1/hive/xml' TBLPROPERTIES ("xmlinput.start"="<BOOK","xmlinput.end"= "</BOOK>");
Test:
hive> select * from BOOKDATA; OK Hadoop Defnitive Guide 24.9 Programming Pig 30.9 Time taken: 0.081 seconds, Fetched: 2 row(s)
Created 06-26-2016 04:31 AM
Hi @elan chelian. It works with following changes (details here😞
Declarations:
DROP TABLE IF EXISTS BOOKDATA; CREATE EXTERNAL TABLE BOOKDATA (TITLE STRING, PRICE FLOAT) ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' WITH SERDEPROPERTIES ( "column.xpath.TITLE"="/BOOK/TITLE/text()", "column.xpath.PRICE"="/BOOK/PRICE/text()") STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION '/user/it1/hive/xml' TBLPROPERTIES ("xmlinput.start"="<BOOK","xmlinput.end"= "</BOOK>");
Test:
hive> select * from BOOKDATA; OK Hadoop Defnitive Guide 24.9 Programming Pig 30.9 Time taken: 0.081 seconds, Fetched: 2 row(s)
Created 04-19-2020 04:41 PM
Hello Sir
I got the output as below but I am not getting any data do you know why?
hive> select * from BOOKDATA; OK Hadoop Defnitive Guide 24.9 Programming Pig 30.9 Time taken: 0.081 seconds, Fetched: 2 row(s)
Created 06-28-2016 02:55 PM
@Predrag Minovic Thanks for the answer. impressive. I was assuming the root node (catalog) needs to be mentioned in the xmlinput.start and xmlinput.end, so that all the nodes in between the root nodes can be queried using the xpath. Thanks for the clarification.