Reply
Highlighted
Explorer
Posts: 7
Registered: ‎09-07-2018

Best way to import nested XML data into a hive table (explode XML arrays)

Hello experts!

 

I would like to create hive tables from nested XML files.

 

For simple XML files (without arrays) I am using Hive-XML-SerDe.

Link: https://github.com/dvasilen/Hive-XML-SerDe

 

Example for arrays in XML:

<Header>
<Header_name>Header Content</Header_name>
</Header>
<record> <result>03.06.2009</result> <result>03.06.2010</result> <result>03.06.2011</result> </record>

 

 Best solution, which I know for array

result array<string>    

["03.06.2009","03.06.2010",...]

 

What I would like to have:

Header_name                      result

Header Content                   03.06.2009

Header Content                   03.06.2010

Header Content                   03.06.2011

 

 

I had a similar problem wirh JSON. Here it is possible in SELECT statements to open arrays with "explode".

Link: https://github.com/rcongiu/Hive-JSON-Serde

...

LATERAL VIEW explode (column_name_with_array) AdTable as column_name_View

 

Maybe the simplest way is to create JSON files from XML files and then to import JSON files into Hive.

 

- Is it possible to explode XML arrays similar to JSON?

 

 

 

 

 

Best regards

Stefan