@Vinicius Higa Murakami ,
I have an xml file with the below format.
<form name="text1" type="text2" parent="text3"</form>
<form name="text4" type="text5" parent="text6"</form>
<form name="text7" type="text8" parent="text9"</form>
<filter_name>"bond1"</filter_name>
<expression>"auto1"</expression>
<filter_name>"bond2"</filter_name>
<expression>"auto2"</expression></form>
<form name="text0" type="text2" parent="text3"</form>
<form name="text0" type="text2" parent="text3"</form>
<filter_name>"bond3"</filter_name>
<expression>"auto3"</expression></form>
Description of data : columns name,type,parent are continuously repeating , but filter_name and expression which might come optionally for one time or many times.
Expected results :
text1 text2 text3 null null
text4 text5 text6 null null
text7 text8 text9 bond1 auto1
text7 text8 text9 bond2 auto2
text0 text2 text3 null null
text0 text2 text3 bond3 auto3
for the above xml input file i have create table like below.
CREATE EXTERNAL TABLE MLC_LIMITS_PARSER_FORMULA_XML_STG(
name STRING,type STRING,parent STRING,
filter_name STRING,
expression STRING)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.name"="/formula/@name",
"column.xpath.type"="/formula/@type",
"column.xpath.parent"="/formula/@parent",
"column.xpath.filter_name"="/formula/filter_name/text()",
"column.xpath.expression_formula"="/formula/expression/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/migration/data/xml_test'
TBLPROPERTIES (
"xmlinput.start"="<form>",
"xmlinput.end"="</form>"
);
But getting results as :
NULL NULL NULL NULL NULL
Please help me to read the above format properly. Not only using hive , any way is fine for me.