Support Questions
Find answers, ask questions, and share your expertise

Parsing xml file with optional column values using hive

Parsing xml file with optional column values using hive

New Contributor

@Vinicius Higa Murakami ,

I have an xml file with the below format.

<form name="text1" type="text2" parent="text3"</form>

<form name="text4" type="text5" parent="text6"</form>

<form name="text7" type="text8" parent="text9"</form>

<filter_name>"bond1"</filter_name>

<expression>"auto1"</expression>

<filter_name>"bond2"</filter_name>

<expression>"auto2"</expression></form>

<form name="text0" type="text2" parent="text3"</form>

<form name="text0" type="text2" parent="text3"</form>

<filter_name>"bond3"</filter_name>

<expression>"auto3"</expression></form>

Description of data : columns name,type,parent are continuously repeating , but filter_name and expression which might come optionally for one time or many times.

Expected results :

text1 text2 text3 null null

text4 text5 text6 null null

text7 text8 text9 bond1 auto1

text7 text8 text9 bond2 auto2

text0 text2 text3 null null

text0 text2 text3 bond3 auto3

for the above xml input file i have create table like below.

CREATE EXTERNAL TABLE MLC_LIMITS_PARSER_FORMULA_XML_STG( name STRING,type STRING,parent STRING, filter_name STRING, expression STRING)

ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' WITH SERDEPROPERTIES ( "column.xpath.name"="/formula/@name",

"column.xpath.type"="/formula/@type",

"column.xpath.parent"="/formula/@parent",

"column.xpath.filter_name"="/formula/filter_name/text()",

"column.xpath.expression_formula"="/formula/expression/text()" )

STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION '/migration/data/xml_test'

TBLPROPERTIES ( "xmlinput.start"="<form>", "xmlinput.end"="</form>" );

But getting results as :

NULL NULL NULL NULL NULL

Please help me to read the above format properly. Not only using hive , any way is fine for me.