Support Questions
Find answers, ask questions, and share your expertise

XML handling suggestions

Expert Contributor


We are ingesting a very complex XML file (super nested, unordered elements etc). We have HDP and HDF for ingestion and considering a few options: 1. XML Serde on File (not the most intuitive for really complex structures) 2. Spit XML into child splits and remerge into 1:M hive tables (a little better than option 1 but still gets a little crazy) 3. Convert XML to JSON with xlst, and use hive Serde. I found JSON SerDe a little more flexible and was able to deal with the deep nested, unordered entities okay. 4. Convert XML directly to Avro with Spark

5. Read XML and pull out only relevant entities/attributes

Any recommendations or other approaches people have succesffuly used?



Will you be using Hive to consume the data?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.