Member since
09-29-2021
4
Posts
0
Kudos Received
0
Solutions
02-28-2023
07:39 AM
Hi, I have a workflow that is picking up an Excel file, that contains 3 sheets, and is attempting to run it through a ConvertExcelToCSVProcessor, but it is failing with the error below: Failed to process incoming Excel document. Tried to allocate an array of length 328,219,733, but the maximum length for this record type is 100,000,000. If the file is not corrupt or large, please open an issue on bugzilla to request increasing the maximum allowable size for this record type. As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride(): org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 328,219,733, but the maximum length for this record type is 100,000,000. If the file is not corrupt or large, please open an issue on bugzilla to request increasing the maximum allowable size for this record type. As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride() Has anyone else run into this error and been able to get around the issue? I'm not seeing where I could set a new value for IOUtils.setByteArrayMaxOverride(). Other option I am considering is a Python script to perform this task, but that would add a great deal of more complexity to my flow. Thanks for any help!
... View more
Labels:
- Labels:
-
Apache NiFi
05-25-2022
07:59 AM
Working on a wait-notify setup, but it is running into issues. It is constantly being populated with an "Address already in use" error although there is just the one set of DistributedMapCacheServer and DistributedMapCacheClient controller services. This oddly doesn't immediately cause an issue when these are first created and the wait-notify actually works fine, but on any subsequent runs with the same input file (same Release Signal Identifier) the files do not wait at all and move through immediately.
... View more
Labels:
- Labels:
-
Apache NiFi
10-19-2021
07:56 AM
I am trying to use a ListSFTP processor to delete files from an SFTP site older than the most recent "History" file. In order to do this, I need to continually (process would run daily or weekly) query all the files in the SFTP server to see if a new "History" file was added, and if so to remove all files from the site before this newest "History" file. The problem is that ListSFTP only shows the newest files added so I am unable to do this. I can ClearState on the processor to make it work, but I would like to avoid this being a manual process. So how can I have ListSFTP show all the files every single time I run it? Note: I've seen some posts about automatically clearing the state using a curl request, but this only works on a stopped processor which defeats the purpose of this being an automated process (https://community.cloudera.com/t5/Support-Questions/Nifi-clear-state/td-p/242347)
... View more
Labels:
- Labels:
-
Apache NiFi
09-29-2021
09:32 AM
I am working off of a 3-node NiFi cluster and that is kicked off by a GenerateFlowfileProcessor run on the primary node, performs some NiFi processing, and then writes the files to the server that I will then run an ExecuteStreamCommand Python script on. The problem I’m running into is I can’t figure out a way to ensure that the processors picking up the first output are run on the same node as the processors that produced the first output. What is the best way to handle producing files that can be accessed by all nodes? Is there a way to specify the node for a process will be run on? (using “run on primary” is not working as the primary node cycles over the process)
... View more
Labels:
- Labels:
-
Apache NiFi