Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

XMLReader creating records further nested then 2nd level

Highlighted

XMLReader creating records further nested then 2nd level

New Contributor

I have an existing xml where i want to create records starting at the N level with N >2, it appears the XMLRecordReader is not able to bring that level into the record oriented data to process. It is reliant on it starting at level 2. Is that correct? If so would it be a good enhancement to expose an attribute to the XMLReader where we can specify the root and allow the reader to start there.

Full disclosure, i've tried SplitXML to try and achieve getting my records into the 2nd level but that causes an OOM.

I've started testing an enhancement to define the root element of of the xml to the XMLRecordReader where it would inform the reader where to start processing. But i wanted to see if this was the right approach, i will gladly help and create the jira and the work if this approach is correct.

2 REPLIES 2

Re: XMLReader creating records further nested then 2nd level

Expert Contributor
@Tim Onyschak

have you tried to use EvaluateXPath? If your data as a record starts from the middle of the XML, and the rest of the XML doc isn't needed, use EvaluateXPath to trim the content down to relevant piece. after that you can use record-based processors with matching schema.

Re: XMLReader creating records further nested then 2nd level

New Contributor

@Ed Berezitsky

Forgot to put that i tried EvaluateXPath and ForkRecord also, but all gave me an OOM. We have essentially a large report with a large array of object about 3 or 4 levels down. My stress test has around 500 MB or data, i only got it to work when i stripped to make it the second level. I also enhanced the XMLRecordReader to take a root to start at and that works for me.