Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Best way to import nested XML data into a hive table (explode XML arrays)

Highlighted

Best way to import nested XML data into a hive table (explode XML arrays)

Explorer

Hello experts!

 

I would like to create hive tables from nested XML files.

 

For simple XML files (without arrays) I am using Hive-XML-SerDe.

Link: https://github.com/dvasilen/Hive-XML-SerDe

 

Example for arrays in XML:

<Header>
<Header_name>Header Content</Header_name>
</Header>
<record> <result>03.06.2009</result> <result>03.06.2010</result> <result>03.06.2011</result> </record>

 

 Best solution, which I know for array

result array<string>    

["03.06.2009","03.06.2010",...]

 

What I would like to have:

Header_name                      result

Header Content                   03.06.2009

Header Content                   03.06.2010

Header Content                   03.06.2011

 

 

I had a similar problem wirh JSON. Here it is possible in SELECT statements to open arrays with "explode".

Link: https://github.com/rcongiu/Hive-JSON-Serde

...

LATERAL VIEW explode (column_name_with_array) AdTable as column_name_View

 

Maybe the simplest way is to create JSON files from XML files and then to import JSON files into Hive.

 

- Is it possible to explode XML arrays similar to JSON?

 

 

 

 

 

Best regards

Stefan