Member since
10-09-2018
1
Post
0
Kudos Received
0
Solutions
10-09-2018
08:48 AM
@Vinicius Higa Murakami , I have an xml file with the below format. <form name="text1" type="text2" parent="text3"</form> <form name="text4" type="text5" parent="text6"</form> <form name="text7" type="text8" parent="text9"</form> <filter_name>"bond1"</filter_name> <expression>"auto1"</expression> <filter_name>"bond2"</filter_name> <expression>"auto2"</expression></form> <form name="text0" type="text2" parent="text3"</form> <form name="text0" type="text2" parent="text3"</form> <filter_name>"bond3"</filter_name> <expression>"auto3"</expression></form> Description of data : columns name,type,parent are continuously repeating , but filter_name and expression which might come optionally for one time or many times. Expected results : text1 text2 text3 null null text4 text5 text6 null null text7 text8 text9 bond1 auto1 text7 text8 text9 bond2 auto2 text0 text2 text3 null null text0 text2 text3 bond3 auto3 for the above xml input file i have create table like below. CREATE EXTERNAL TABLE MLC_LIMITS_PARSER_FORMULA_XML_STG(
name STRING,type STRING,parent STRING,
filter_name STRING,
expression STRING) ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.name"="/formula/@name", "column.xpath.type"="/formula/@type", "column.xpath.parent"="/formula/@parent", "column.xpath.filter_name"="/formula/filter_name/text()", "column.xpath.expression_formula"="/formula/expression/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/migration/data/xml_test' TBLPROPERTIES (
"xmlinput.start"="<form>",
"xmlinput.end"="</form>"
); But getting results as : NULL NULL NULL NULL NULL Please help me to read the above format properly. Not only using hive , any way is fine for me.
... View more
Labels: