Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive XML Parising - Null value returned

avatar
Contributor

Hi All,

Tried out a sample xml parsing using the Serde. but it returns a null value.

hive> DROP TABLE BOOKDATA;
OK
Time taken: 0.486 seconds
hive>
  > CREATE EXTERNAL TABLE BOOKDATA(
  > TITLE VARCHAR(40),
  > PRICE  INT
  > )ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
  > WITH SERDEPROPERTIES (
  > "column.xpath.TITLE"="/CATALOG/BOOK/TITLE/",
  > "column.xpath.PRICE"="/CATALOG/BOOK/PRICE/")
  > STORED AS
  > INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
  > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
  > LOCATION '/sourcedata'
  > TBLPROPERTIES (
  > "xmlinput.start"="<CATALOG",
  > "xmlinput.end"= "</CATALOG>"
  > );
OK
Time taken: 0.285 seconds
hive> select * from BOOKDATA;
OK
NULL  NULL
Time taken: 0.184 seconds, Fetched: 1 row(s)
hive>
~]$ hadoop fs -cat /sourcedata/bookdata.xml
<CATALOG>
<BOOK>
<TITLE>Hadoop Defnitive Guide</TITLE>
<AUTHOR>Tom White</AUTHOR>
<COUNTRY>US</COUNTRY>
<COMPANY>CLOUDERA</COMPANY>
<PRICE>24.90</PRICE>
<YEAR>2012</YEAR>
</BOOK>
<BOOK>
<TITLE>Programming Pig</TITLE>
<AUTHOR>Alan Gates</AUTHOR>
<COUNTRY>USA</COUNTRY>
<COMPANY>Horton Works</COMPANY>
<PRICE>30.90</PRICE>
<YEAR>2013</YEAR>
</BOOK>
</CATALOG>
1 ACCEPTED SOLUTION

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
3 REPLIES 3

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
New Contributor

Hello Sir

I got the output as below but I am not getting any data do you know why?

 

hive> select * from BOOKDATA;
OK
Hadoop Defnitive Guide	24.9
Programming Pig	30.9
Time taken: 0.081 seconds, Fetched: 2 row(s)

avatar
Contributor

@Predrag Minovic Thanks for the answer. impressive. I was assuming the root node (catalog) needs to be mentioned in the xmlinput.start and xmlinput.end, so that all the nodes in between the root nodes can be queried using the xpath. Thanks for the clarification.