Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Best way to import nested XML data into a hive table (explode XML arrays)

Best way to import nested XML data into a hive table (explode XML arrays)

Explorer

Hello experts!

 

I would like to create hive tables from nested XML files.

 

For simple XML files (without arrays) I am using Hive-XML-SerDe.

Link: https://github.com/dvasilen/Hive-XML-SerDe

 

Example for arrays in XML:

<Header>
<Header_name>Header Content</Header_name>
</Header>
<record> <result>03.06.2009</result> <result>03.06.2010</result> <result>03.06.2011</result> </record>

 

 Best solution, which I know for array

result array<string>    

["03.06.2009","03.06.2010",...]

 

What I would like to have:

Header_name                      result

Header Content                   03.06.2009

Header Content                   03.06.2010

Header Content                   03.06.2011

 

 

I had a similar problem wirh JSON. Here it is possible in SELECT statements to open arrays with "explode".

Link: https://github.com/rcongiu/Hive-JSON-Serde

...

LATERAL VIEW explode (column_name_with_array) AdTable as column_name_View

 

Maybe the simplest way is to create JSON files from XML files and then to import JSON files into Hive.

 

- Is it possible to explode XML arrays similar to JSON?

 

 

 

 

 

Best regards

Stefan

Don't have an account?
Coming from Hortonworks? Activate your account here