Member since
07-30-2021
3
Posts
0
Kudos Received
0
Solutions
08-03-2021
10:02 AM
Hi Green, `../content_repository` is the actual configured value. The nifi home directory is /opt/nifi/nifi-current. Not sure why our devops/infrastructure team set it up that way, but it usually works fine. I can reproduce the problem on one node fairly easily, but reproducing it across several nodes ("whack-a-mole") has been less trivial. Do you have any suggestions on what other file system observations to make on a single-node failure? I've confirmed I can read, write and delete files on the content_repository directory (which is mounted on a separate volume). It's not obvious to me what else to look for.
... View more
08-02-2021
10:15 AM
I'll need to recreate a multi-node error condition to answer your first part. I did manage zero errors with 15M files over 72 hours with FetchS3Object's ConcurrentTasks=5. I tried values of 500 & 10 today. In both cases, the problem appeared within a few hours, though only one node was impacted each time. nifi.properties content repository/claim properties # Content Repository
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.content.claim.max.appendable.size=1 MB
nifi.content.claim.max.flow.files=100
nifi.content.repository.directory.default=../content_repository
nifi.content.repository.archive.max.retention.period=3 days
nifi.content.repository.archive.max.usage.percentage=85%
nifi.content.repository.archive.enabled=true
nifi.content.repository.always.sync=false
nifi.content.viewer.url=/nifi-content-viewer/
... View more
07-30-2021
08:43 PM
I set up a process group to pull ~10^8 S3 files, consolidate them and save to HDFS. My FetchS3Object processor fails after 2-17 hours with the following error. When this happens, a ConsumeKafka_2_6 processor in a separate group fails with the same error message. Other processors have this message flicker in and out, but can typically self-resolve. 16:29:54 UTC ERROR
FetchS3Object[id=50cb6c89-c4a3-3df6-807b-7f92555fd572] FetchS3Object[id=50cb6c89-c4a3-3df6-807b-7f92555fd572] failed to process session due to Failed to import data from com.amazonaws.services.s3.model.S3ObjectInputStream@1a9a94ff for StandardFlowFileRecord[uuid=bd5ccda4-d432-4086-8f4f-bb763afd108b,claim=,offset=0,name=Apr-2020/20200420-17H53M38S.json,size=0] due to org.apache.nifi.processor.exception.FlowFileAccessException: Unable to create ContentClaim due to java.io.FileNotFoundException: /opt/nifi/content_repository/993/1627576194712-2847713 (No such file or directory); Processor Administratively Yielded for 1 sec: org.apache.nifi.processor.exception.FlowFileAccessException: Failed to import data from com.amazonaws.services.s3.model.S3ObjectInputStream@1a9a94ff for StandardFlowFileRecord[uuid=bd5ccda4-d432-4086-8f4f-bb763afd108b,claim=,offset=0,name=Apr-2020/20200420-17H53M38S.json,size=0] due to org.apache.nifi.processor.exception.FlowFileAccessException: Unable to create ContentClaim due to java.io.FileNotFoundException: /opt/nifi/content_repository/993/1627576194712-2847713 (No such file or directory) The only way I've found to fix it is to reboot the NiFi node mentioned in the error message. Sometimes that resolves it immediately; other times the error message shifts to a different node and I have to play whack-a-mole before it resolves. When I connect to a problem node, its file system works fine (plenty of space, can read, write, etc), but that directory (/opt/nifi/content_repository) is empty. On healthy nodes, that directory is full of subdirectories. On a problem node, log files showed the same error message as above (no extra details). When I lower the FetchS3Object ConcurrentTask count (Ex 100 to 5), it can run longer before an error, but it hits the same error eventually. Any help would be much appreciated. The closest error message I found in existing posts pointed to too many open files, but that wasn't part of the error messages I've been getting.
... View more
Labels:
- Labels:
-
Apache NiFi