Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Getting multiple records while loading nested XML data into HIVE

avatar
Explorer

Hi All,

I am trying to load nested XML file into HIVE table, but i see multiple lines getting generated for the values in Array.

XML File :

<?xml version="1.0" encoding="UTF-8"?> <FormServerResponse> <Status> <StatusCode>0</StatusCode> <StatusDesc> Success</StatusDesc> <ConfidenceIndex> D</ConfidenceIndex> <Messages> <Message severity="Success"> <StatusCode> 0</StatusCode> <PageNumber> 1</PageNumber> <FileName> Filxxxxe.pdf</FileName> <LayoutFileId> xxxx</LayoutFileId> <LayoutPageId> xxxxx</LayoutPageId> <LayoutIdentiferText/> <Text> xx xxxxxx xx</Text> </Message> </Messages> </Status> </FormServerResponse>

Command I used to load data into HIVE:

create table ViewTable( StatusCode string,StatusDesc string,ConfidenceIndex string,Messages struct<Message: array<struct< severity:string,statuscode:string,pagenumber:string,filename:string,layoutfileid:string, layoutpageid:string, layoutidentifertext:string,text:string>>> ) row format serde 'com.ibm.spss.hive.serde2.xml.XmlSerDe' with serdeproperties ( "column.xpath.StatusCode" = "/Status/StatusCode/text()", "column.xpath.StatusDesc" = "/Status/StatusDesc/text()","column.xpath.ConfidenceIndex" = "/Status/ConfidenceIndex/text()", "column.xpath.Messages" = "/Status/Messages/Message" ) stored as inputformat 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' tblproperties ( "xmlinput.start" = "<Status>" ,"xmlinput.end" = "</Status>" );

Result :

select * from viewtable; OK 0 Success D {"message":[{"severity":null,"statuscode":"0","pagenumber":null,"filename":null,"layoutfileid":null,"layoutpageid":null,"layoutidentifertext":null,"text":null},{"severity":null,"statuscode":null,"pagenumber":"1","filename":null,"layoutfileid":null,"layoutpageid":null,"layoutidentifertext":null,"text":null},{"severity":null,"statuscode":null,"pagenumber":null,"filename":"Filxxxxe.pdf","layoutfileid":null,"layoutpageid":null,"layoutidentifertext":null,"text":null},{"severity":null,"statuscode":null,"pagenumber":null,"filename":null,"layoutfileid":"xxxx","layoutpageid":null,"layoutidentifertext":null,"text":null},{"severity":null,"statuscode":null,"pagenumber":null,"filename":null,"layoutfileid":null,"layoutpageid":"xxxxx","layoutidentifertext":null,"text":null},{"severity":null,"statuscode":null,"pagenumber":null,"filename":null,"layoutfileid":null,"layoutpageid":null,"layoutidentifertext":null,"text":null},{"severity":null,"statuscode":null,"pagenumber":null,"filename":null,"layoutfileid":null,"layoutpageid":null,"layoutidentifertext":null,"text":"xx xxxxxx xx"}]} Time taken: 0.423 seconds, Fetched: 1 row(s)

Result expected:

0 Success D {"message":[{"severity":success,"statuscode":"0","pagenumber":1,"filename":Filxxxe.pdf,"layoutfileid":xxxx,"layoutpageid":xxxxx,"layoutidentifertext":null,"text":xx xxxxx xx}]}

Could someone help me where I am making mistake?????

1 ACCEPTED SOLUTION

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
4 REPLIES 4

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Explorer

Thanks @Vinicius Higa Murakami. I was able to make correction and was able to achieve it.

There is one more help I need . I am trying to load the complex structure of XML> i am able to load all the fields but there is one section which I am trying in many way to load the data but upon the loading of data, i still see the values to be NULL>

Attached the files.

a) XPATH what i am writing b) the XML file I am loading

xpath.txt

nestedxml.xml

Output:

0 Success A [{"message":{"severity":null,"statuscode":"0","pagenumber":"1","filename":"File.pdf","layoutfileid":"ACORD_125_2016_03","layoutpageid":"page1_COMMERCIAL_INSURANCE_APPLICATION","layoutidentifertext":null,"text":"Page Successfully Processed"}},{"message":{"severity":null,"statuscode":"0","pagenumber":"2","filename":"File.pdf","layoutfileid":"ACORD_125_2016_03","layoutpageid":"page2_CONTACT_INFORMATION","layoutidentifertext":null,"text":"Page Successfully Processed"}},{"message":{"severity":null,"statuscode":"0","pagenumber":"3","filename":"File.pdf","layoutfileid":"ACORD_125_2016_03","layoutpageid":"page3_GENERAL_INFORMATION","layoutidentifertext":null,"text":"Page Successfully Processed"}},{"message":{"severity":null,"statuscode":"0","pagenumber":"4","filename":"File.pdf","layoutfileid":"ACORD_125_2016_03","layoutpageid":"page4_PRIOR_CARRIER_INFORMATION","layoutidentifertext":null,"text":"Page Successfully Processed"}}] {"transactionid":"80c32800-65cb-11e8-8f5a-005056a3433861521442784784","productversion":"2.06.3.00064","requestdate":"2018-06-01","processingtime":"4 sec","filesize":"464888 bytes","extractiontechnique":"Text"} {"lobcd":"CPKGE","filesreceivedcnt":"1","pagesreceivedcnt":"4","pagesprocessedcnt":"4","formsprocessedcnt":"1","forms":[{"form":{"formname":"ACORD_125_2016_03","totalfields":"533","fieldswithdata":"82","ratebearingfields":"12","ratebearingfieldswithdata":"9"}}]} NULL

Could you please help me with this

Thanks,

Harish

avatar

Hi @Anji Raju!
Hmm, guess there's a little almost invisible mistake in your xpath 🙂
Try to change your ReturnPayLoad

 "column.xpath.ReturnPayLoad" = "/FormServerResponse/ReturnPaylLoad/ACORD/SignonRq"

to ReturnPayload

"column.xpath.ReturnPayLoad" = "/FormServerResponse/ReturnPayload/ACORD/SignonRq"

Hope this helps!

avatar
Explorer

Thanks @Vinicius Higa Murakami , I was able to figure out the Typo Mistake. Thanks for the help