I have an existing xml where i want to create records starting at the N level with N >2, it appears the XMLRecordReader is not able to bring that level into the record oriented data to process. It is reliant on it starting at level 2. Is that correct? If so would it be a good enhancement to expose an attribute to the XMLReader where we can specify the root and allow the reader to start there.
Full disclosure, i've tried SplitXML to try and achieve getting my records into the 2nd level but that causes an OOM.
I've started testing an enhancement to define the root element of of the xml to the XMLRecordReader where it would inform the reader where to start processing. But i wanted to see if this was the right approach, i will gladly help and create the jira and the work if this approach is correct.
have you tried to use EvaluateXPath? If your data as a record starts from the middle of the XML, and the rest of the XML doc isn't needed, use EvaluateXPath to trim the content down to relevant piece. after that you can use record-based processors with matching schema.
Forgot to put that i tried EvaluateXPath and ForkRecord also, but all gave me an OOM. We have essentially a large report with a large array of object about 3 or 4 levels down. My stress test has around 500 MB or data, i only got it to work when i stripped to make it the second level. I also enhanced the XMLRecordReader to take a root to start at and that works for me.