Support Questions

Find answers, ask questions, and share your expertise

ConvertExcelToCSVProcessor - File too Large

avatar

Hi,

 

I have a workflow that is picking up an Excel file, that contains 3 sheets, and is attempting to run it through a ConvertExcelToCSVProcessor, but it is failing with the error below: 

 

Failed to process incoming Excel document. Tried to allocate an array of length 328,219,733, but the maximum length for this record type is 100,000,000. If the file is not corrupt or large, please open an issue on bugzilla to request increasing the maximum allowable size for this record type. As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride(): org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 328,219,733, but the maximum length for this record type is 100,000,000. If the file is not corrupt or large, please open an issue on bugzilla to request increasing the maximum allowable size for this record type. As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride()

 

Has anyone else run into this error and been able to get around the issue? I'm not seeing where I could set a new value for IOUtils.setByteArrayMaxOverride(). Other option I am considering is a Python script to perform this task, but that would add a great deal of more complexity to my flow. 

 

Thanks for any help!

1 ACCEPTED SOLUTION

avatar
Master Mentor

@TRSS_Cloudera 

The issue you have described links to this known issue reported in Apache NiFi"
https://issues.apache.org/jira/browse/NIFI-10792

The discussion found in the comments of this jira point to a couple workarounds which includes the negatives of each.

From that discussion it appears the best approach is development of a new "Excel Record Reader" controller service that could be used by the existing ConvertRecord processor and CSVRecordSetWriter.
This is outlined in following jira:
https://issues.apache.org/jira/browse/NIFI-11167

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

View solution in original post

1 REPLY 1

avatar
Master Mentor

@TRSS_Cloudera 

The issue you have described links to this known issue reported in Apache NiFi"
https://issues.apache.org/jira/browse/NIFI-10792

The discussion found in the comments of this jira point to a couple workarounds which includes the negatives of each.

From that discussion it appears the best approach is development of a new "Excel Record Reader" controller service that could be used by the existing ConvertRecord processor and CSVRecordSetWriter.
This is outlined in following jira:
https://issues.apache.org/jira/browse/NIFI-11167

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt