Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

XML handling suggestions


XML handling suggestions

Expert Contributor


We are ingesting a very complex XML file (super nested, unordered elements etc). We have HDP and HDF for ingestion and considering a few options: 1. XML Serde on File (not the most intuitive for really complex structures) 2. Spit XML into child splits and remerge into 1:M hive tables (a little better than option 1 but still gets a little crazy) 3. Convert XML to JSON with xlst, and use hive Serde. I found JSON SerDe a little more flexible and was able to deal with the deep nested, unordered entities okay. 4. Convert XML directly to Avro with Spark

5. Read XML and pull out only relevant entities/attributes

Any recommendations or other approaches people have succesffuly used?


Re: XML handling suggestions


Will you be using Hive to consume the data?

Don't have an account?
Coming from Hortonworks? Activate your account here