Member since
02-15-2017
12
Posts
0
Kudos Received
0
Solutions
03-14-2018
09:44 AM
Thanks, I think you are right - we need to focus on strategies for reducing the size of these large flowfiles rather than trying to transmit them site2site as they are.
... View more
03-13-2018
02:08 PM
Hi We have a situation where we have to, occasionally, send large flowfiles (>2GB) via site-to-site transmision to a NiFi canvas on another NiFi installation. Unfortunately the transmission fails for these large files. We receive messages such as: "Awaiting transferDataLatch has been timeout" and: flow-files has reached to its end, but
produced : read : wrote byte sizes ( -xxxxxxxxxx : -xxxxxxxxxx : yyyyyyyyyy) were not equal.
Something went wrong. and on the remote machine: "EofException: Early EOF" It seems to me that the connection is being truncated and/or dropped before the transmission can be completed. We have tried raising the Communication Timeout on the local Remote Process Group from 30 sec to 120 sec, but it still fails. I don't know if we keep raising this it would eventually succeed, but I'm interested to know, are there some specific configurations to NiFi and/or the underlying machines that we can make that will enable these transmissions to succeed? Many thanks Richard
... View more
Labels:
- Labels:
-
Apache NiFi
03-08-2018
11:52 AM
Hi We have a problem with the amount of logs building up in /var/log/hadoop/hdfs/audit/solr/spool so I need to remove some of the files from there as a short-term measure. I understand that this data is spooled to disk prior to indexing by solr, so I don't want to move or remove any files that have not yet been processed, or are still active. I can see from lsof that only the latest log is being held open by the NameNode. Does this mean it's safe to remove all log files apart from this last one? Thanks Richard
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Ranger
-
Apache Solr
02-20-2018
08:37 PM
Hi @Matt Clarke Thanks for taking the time to answer my question and confirm that the size of the queued content doesn't include any archived content. I've taken a closer look at the status history of the connection (queue) over the past 24 hours and I can see that the nature of the data flowing in varies depending on the time of day. It seems that earlier in the day a large number of small flow files pass into the queue, but they can be processed rapidly. As the day goes on we start to see flow files of a much larger size. I think this is the explanation as to why the number of flow files decreases but the size of the data increases. This is something I hadn't expected until I looked at the status history closely. Thanks again for helping me get to the bottom of this! Richard
... View more
02-20-2018
01:31 PM
Hi I have observed a confusing pattern over the last few days to do with the amount of data queued on a connection. Every day, at the start of the day, we have about 4 million flow files in a particular queue, with the total data shown on the queue as approx 70 GB. As the day goes on the number of flow files in the queue reduces as the queue is processed, but the total size of the queue rises, to over 100GB (it eventually starts dropping again, though). This behaviour is not what I expect and I can't find anything about it in the user guide. My working theory is that the total might also include archived content (we have retention set to 12 hours) and/or content claims. However I can't find confirmation of this. Is anyone able to shed some light on this behaviour for me? Thanks Richard
... View more
Labels:
- Labels:
-
Apache NiFi
07-19-2017
11:07 AM
I've heard from other sources that the provenance directory should also be clean out. Do you think that's necessary or not?
... View more
07-19-2017
09:56 AM
Hi @Wynner, thanks for answers. It would be really good if there was some kind of tool that could at least dump out the state of the repositories so that we can try to understand more about what's going on. Something for the future, perhaps. I'm not sure if the issue is with the MergeContent processor exactly. I certainly don't understand why the repo has entered this state where FlowFiles either cannot be found or appear to be stale. The problem is that even with the processor running the queue does not get processed because the flowfiles for the given IDs can't be found or are stale. We have min/max group size 64MB-256MB, min # of entries: 1, max # of entries: 10000, Max Bin Age: 1 Mins, max # of Bins: 100, Delimiter strategy: Text, Attribute strategy: Keep All Unique Attributes. Run Schedule: 0 sec
... View more
07-18-2017
04:18 PM
Hi We are experiencing a problem with an apparently large queue that can't be processed through a MergeContent processor. Our problem looks very similar to the one described on the forum here: https://community.hortonworks.com/questions/91051/nifi-error-not-the-most-recent-version-of-this-flo.html and also here: https://issues.apache.org/jira/browse/NIFI-3329 Our logs are full of: FlowFileHandlingException: StandardFlowFileRecord[uuid=<UUID> ... ] is not the most recent version of this FlowFile within this session and FlowFileHandlingException: StandardFlowFileRecord[uuid=<UUID> ... ]... is not known in this session It seems, to me, as has been suggested elsewhere, that there may be some kind of corruption of our repositories. The message appear to indicate that there are inconsistencies in the repositories or database. I was wondering how we could repair this situation. I can't find any tools that could allow me to analyse/verify the flowfile or provenance repositories. Furtherrmore, if we can't repair this situation how would we go about safely cleaning out the repository? So my questions are: 1) Are there any tools for analysing, verifying, or repairing NiFi repositories on disk? 2) What would be the process of clearing out the repositories on disk if we wanted to start with a clean slate? Thanks Richard
... View more
Labels:
- Labels:
-
Apache NiFi
07-18-2017
10:33 AM
We have a similar problem. A MergeContent processor giving out 'is not the most recent version of thisFlowFile within this session' and 'is not known in this session' errors. Also we have the 'phantom' queue that you describe. A large queue that the processor does not process. But when we restart NiFi the queue drops the zero.
... View more
05-31-2017
02:30 PM
Thank you. I've tested this on our Dev system and it looks like it's exactly what we need. Regards Richard
... View more