Support Questions

Josiah_Johnston · ‎07-30-2021

I set up a process group to pull ~10^8 S3 files, consolidate them and save to HDFS. My FetchS3Object processor fails after 2-17 hours with the following error. When this happens, a ConsumeKafka_2_6 processor in a separate group fails with the same error message. Other processors have this message flicker in and out, but can typically self-resolve.

16:29:54 UTC ERROR
FetchS3Object[id=50cb6c89-c4a3-3df6-807b-7f92555fd572] FetchS3Object[id=50cb6c89-c4a3-3df6-807b-7f92555fd572] failed to process session due to Failed to import data from com.amazonaws.services.s3.model.S3ObjectInputStream@1a9a94ff for StandardFlowFileRecord[uuid=bd5ccda4-d432-4086-8f4f-bb763afd108b,claim=,offset=0,name=Apr-2020/20200420-17H53M38S.json,size=0] due to org.apache.nifi.processor.exception.FlowFileAccessException: Unable to create ContentClaim due to java.io.FileNotFoundException: /opt/nifi/content_repository/993/1627576194712-2847713 (No such file or directory); Processor Administratively Yielded for 1 sec: org.apache.nifi.processor.exception.FlowFileAccessException: Failed to import data from com.amazonaws.services.s3.model.S3ObjectInputStream@1a9a94ff for StandardFlowFileRecord[uuid=bd5ccda4-d432-4086-8f4f-bb763afd108b,claim=,offset=0,name=Apr-2020/20200420-17H53M38S.json,size=0] due to org.apache.nifi.processor.exception.FlowFileAccessException: Unable to create ContentClaim due to java.io.FileNotFoundException: /opt/nifi/content_repository/993/1627576194712-2847713 (No such file or directory)

Untitled 5.png

The only way I've found to fix it is to reboot the NiFi node mentioned in the error message. Sometimes that resolves it immediately; other times the error message shifts to a different node and I have to play whack-a-mole before it resolves. When I connect to a problem node, its file system works fine (plenty of space, can read, write, etc), but that directory (/opt/nifi/content_repository) is empty. On healthy nodes, that directory is full of subdirectories.

On a problem node, log files showed the same error message as above (no extra details). When I lower the FetchS3Object ConcurrentTask count (Ex 100 to 5), it can run longer before an error, but it hits the same error eventually.

Any help would be much appreciated. The closest error message I found in existing posts pointed to too many open files, but that wasn't part of the error messages I've been getting.

Green_ · ‎08-01-2021

@Josiah_Johnston

You mentioned that a problematic node's content repository is empty when you check. What about the flowfile repository? If you reboot a node and the problem shifts to a different one, do the repositories turn out empty on the new node as well (even if there were files flowing in that node previously)?

How have you configured your content repository/content claim properties in the nifi.properties file?

Josiah_Johnston · ‎08-02-2021

I'll need to recreate a multi-node error condition to answer your first part. I did manage zero errors with 15M files over 72 hours with FetchS3Object's ConcurrentTasks=5.

I tried values of 500 & 10 today. In both cases, the problem appeared within a few hours, though only one node was impacted each time.

nifi.properties content repository/claim properties

# Content Repository
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.content.claim.max.appendable.size=1 MB
nifi.content.claim.max.flow.files=100
nifi.content.repository.directory.default=../content_repository
nifi.content.repository.archive.max.retention.period=3 days
nifi.content.repository.archive.max.usage.percentage=85%
nifi.content.repository.archive.enabled=true
nifi.content.repository.always.sync=false
nifi.content.viewer.url=/nifi-content-viewer/

Green_ · ‎08-02-2021

Your archive properties are definitely generous but I don't believe they're related to the problem. One thing that stands out to me is that your content repository directory is `../content_repository` and not `./content_repository`. Could that have been just a mistake in the reply or is that the actual configured value? I'd be suspicious of that because the errors you get state the missing path is `/opt/nifi/content_repository/...` Whilst the configuration you posted would actually imply the repository is at the same directory level as the nifi home directory.

If this isn't the case and you just copied the configuration incorrectly, I'm afraid I don't have too many other ideas. Seeing as you can reproduce the error fairly easily my recommendation would be to do so and to more closely monitor the file system like I mentioned In the previous reply - checking the status of the flowfile/content repositories and monitoring this 'whack-a-mole' phenomenon across nodes. If you come up with new findings you could post them here.

Josiah_Johnston · ‎08-03-2021

Hi Green,

`../content_repository` is the actual configured value. The nifi home directory is

/opt/nifi/nifi-current.

Not sure why our devops/infrastructure team set it up that way, but it usually works fine.

I can reproduce the problem on one node fairly easily, but reproducing it across several nodes ("whack-a-mole") has been less trivial. Do you have any suggestions on what other file system observations to make on a single-node failure? I've confirmed I can read, write and delete files on the content_repository directory (which is mounted on a separate volume). It's not obvious to me what else to look for.

Green_ · ‎08-03-2021

@Josiah_Johnston
Based off your last comment, my new hunch would be that perhaps there is something going on with the volume you use for the content repository. Still, it's hard to say without more testing.

Here are a couple of tests/checks I would run if this happened in one of our nifi clusters (both the problem as you describe it and what I could spot from the screenshot you sent):

While the content repo is empty, are there any other flowfiles being processed in the node? it would make no sense for any flow/ingestion to work if the content repository is completely empty. Perhaps the content claims are being written elsewhere, or perhaps they are immediately deleted upon being created in the content repo
What happens to the flowfiles which were already in the flow once the problem started? if all their content was deleted, they shouldn't be able to proceed in the flow even after a restart. What happens if you try and view their content in the UI?
Perhaps there are some helpful logs written to the nifi applogs once the problem starts (just before the errors relating to not finding content claims start flooding in)
What would happen if you were to create a file in the content repo and then reproduced the problem? would it only delete the nifi-generated content repo files/directories or would it also delete your own file?
Is there a way to tell on which node the issue will happen on? Is it perhaps happening on the same node (for the same storage volume) repeatedly?
What is the flowfile repository's status while the issue is happening? Does it still have all its regular files even when the content repo was deleted?

If you try and google something along the lines of 'nifi content repository empty / deleting' no relevant results come up. My team and I have never experienced something similar to this either. This is why I suspect it is perhaps not a nifi related issue but rather something to do with your infrastructure / something else on your end.

Cloudera Community

Support Questions

Processor fails with FlowFileAccessException: Unable to create ContentClaim due to java.io.FileNotFoundException

NiFi 1.9.1 GetSFTP throws FlowFileAccessException:...

Unable to successfully use SQL processors after re...

Bad : The Hive Metastore canary failed to create a...

Apache Nifi : ListSFTP Processor Failed to proper...

How to set a processor to DEBUG when on Cloudera D...

Failed to start namenode. java.io.FileNotFoundExce...

Failed to register flow with Flow Registry due to ...

How to create a flow files and add attributes in a...

Druid node failing with OOM "java.lang.OutOfMemory...

Failed to connect to server: :8032: retries get fa...