Member since
07-30-2019
3391
Posts
1618
Kudos Received
1001
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 283 | 11-05-2025 11:01 AM | |
| 167 | 11-05-2025 08:01 AM | |
| 142 | 11-04-2025 10:16 AM | |
| 502 | 10-20-2025 06:29 AM | |
| 642 | 10-10-2025 08:03 AM |
07-13-2017
02:10 PM
NiFi stores FlowFile Attributes in the FlowFile repo and FlowFile Content in the Content repo. NiFi knows which queue FlowFiles were in when if it is shutdown. This allows Nifi to reload these FlowFiles back in to those queues and pick up where the dataflow left off after a restart.
... View more
07-13-2017
01:57 PM
2 Kudos
@siva karna I am not following the statement "so there is an abstraction for the first process group flow file it will stop so we will loss the data". Why would stopping a dataflow cause data loss? NiFi will only read in new nars/jars added to a NiFi lib directory on startup. There is no option to dynamically add classes during runtime. Thanks, Matt
... View more
07-13-2017
12:42 PM
1 Kudo
@Akash S The ListHDFS processor records state so that only new files are listed. The processor also has a configuration option for recursing subdirectories. You could set the directory to only /MajorData/Location/ and let it list all files from the subdirectories. As new subdirectories are created, the files within those new directories will get listed. If that does not work for you, the NiFi expression language (EL) statement that you are looking for would look something like this for the directory: /MajorData/Location/${now():format('yyyy/MM/dd')} The above would cause Nifi to only look in the target directory fro Files until the day changed. I am not sure the rate at which files are written in to these target directories, but be mindful that if a file is add between runs of the listHDFS processor and the day changes between those runs, that file will not get listed using the above EL statement. Thanks, Matt
... View more
07-12-2017
07:17 PM
2 Kudos
@M R I find the following very useful when trying to build Java regular expressions: http://myregexp.com The Java regular expression: ^(.*?)%%(.*?)%%(.*?)%%(.*?)%%(.*?),(.*?)%%(.*?)$ It has 7 capture groups that will result in: When you add a ew property to the extractText processor with a property name of "string" and use the above java regex. Of course if you are only looking for two capture groups, you could use the following regex instead: ^(.*?)%%.*?%%(.*?)%%.*?%%.*?,.*?%%.*?$ Thanks, Matt
... View more
07-11-2017
03:36 PM
@Eric Lloyd I considered that as well at first, but went the other route as I could be sure my byte sequence would be unique no matter what the stack trace looked like. Since you are looking for a line return followed by 20 you may have an issue with the very fist line in your file. I would test that to confirm. Matt
... View more
07-11-2017
03:17 PM
@Eric Lloyd Must be a by-product of the splitContent operation. It is reading the last line return before it sees the next bytes sequence. If the blank line becomes an issue, you can remove blank lines using a ReplaceText processor also. This will replace any line that starts with a line return with nothing. Thanks, Matt
... View more
07-11-2017
02:31 PM
@Eric Lloyd Another option (not as nice as the GrokReader) is to use SplitContent instead of SplitText processor. So here I use the ReplaceText processor to date string format every log line starts with and prepend to that a unique string that i can use later to split the content. I then use the SplitText processor to split based on that unique string. This means that any stack trace that follows a log line will be captured with the preceding log entry. After that you can do what you want with the resulting splits. I chose to filter out the splits for ERROR or WARN log lines and auto-terminate everything else. Here is an example output of one of my log lines with a stack trace: 2017-07-11 10:21:38,087 ERROR [Timer-Driven Process Thread-2] o.a.n.p.attributes.UpdateAttribute
java.lang.StringIndexOutOfBoundsException: String index out of range: 40
at java.lang.String.substring(String.java:1963) ~[na:1.8.0_77]
at org.apache.nifi.attribute.expression.language.evaluation.functions.SubstringEvaluator.evaluate(SubstringEvaluator.java:55) ~[nifi-expression-language-1.1.0.2.1.4.0-5.jar:1.1.0.2.1.4.0-5]
at org.apache.nifi.attribute.expression.language.Query.evaluate(Query.java:570) ~[nifi-expression-language-1.1.0.2.1.4.0-5.jar:1.1.0.2.1.4.0-5]
at org.apache.nifi.attribute.expression.language.Query.evaluateExpression(Query.java:388) ~[nifi-expression-language-1.1.0.2.1.4.0-5.jar:1.1.0.2.1.4.0-5]
at org.apache.nifi.attribute.expression.language.StandardPreparedQuery.evaluateExpressions(StandardPreparedQuery.java:48) ~[nifi-expression-language-1.1.0.2.1.4.0-5.jar:1.1.0.2.1.4.0-5]
at org.apache.nifi.attribute.expression.language.StandardPropertyValue.evaluateAttributeExpressions(StandardPropertyValue.java:152) ~[nifi-expression-language-1.1.0.2.1.4.0-5.jar:1.1.0.2.1.4.0-5]
at org.apache.nifi.attribute.expression.language.StandardPropertyValue.evaluateAttributeExpressions(StandardPropertyValue.java:133) ~[nifi-expression-language-1.1.0.2.1.4.0-5.jar:1.1.0.2.1.4.0-5]
at org.apache.nifi.processors.attributes.UpdateAttribute.executeActions(UpdateAttribute.java:496) ~[nifi-update-attribute-processor-1.1.0.2.1.4.0-5.jar:1.1.0.2.1.4.0-5]
at org.apache.nifi.processors.attributes.UpdateAttribute.onTrigger(UpdateAttribute.java:377) ~[nifi-update-attribute-processor-1.1.0.2.1.4.0-5.jar:1.1.0.2.1.4.0-5]
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) ~[nifi-api-1.1.0.2.1.4.0-5.jar:1.1.0.2.1.4.0-5]
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1099) [nifi-framework-core-1.1.0.2.1.4.0-5.jar:1.1.0.2.1.4.0-5]
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) [nifi-framework-core-1.1.0.2.1.4.0-5.jar:1.1.0.2.1.4.0-5]
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) [nifi-framework-core-1.1.0.2.1.4.0-5.jar:1.1.0.2.1.4.0-5]
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132) [nifi-framework-core-1.1.0.2.1.4.0-5.jar:1.1.0.2.1.4.0-5]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_77]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_77]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_77]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_77]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_77]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_77]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77] Thanks, Matt
... View more
07-11-2017
02:03 PM
@adrian white At 90 MB, I suspect that CSV file has a lot of lines to split. Are you seeing any Out Of Memory errors in your nifi-app.log? To help reduce the heap usage here, you may want to try using two splitText processor in series. The first splitting every 1,000 - 10,000 lines and the second then splitting those by every line. NiFi FlowFile attributes are kept in heap memory space. NiFi has a mechanism for swapping FlowFile attributes to disk for queues, but this mechanism does not apply to processors. The SplitText processor holds the FlowFile attributes for every new FlowFile it is creating in heap until all resulting Split FlowFiles have been created. When splitting creates a huge number of resulting FlowFiles in a single transaction, you can run out of heap space. So by splitting the job between multiple splitText processors in series, you reduce the number of FlowFiles that are being generated per transaction thus decreasing heap usage. Thanks, Matt
... View more
07-10-2017
06:12 PM
Just to add more detail to the above answer...
- Granting users the ability to run provenance queries does to then give users the ability to view details on every piece of data that passes through any processor component on the canvas. - if you were to monitor the nifi-app.log on each of your nodes, you would likely see that the provence query is returning events yet none are being displayed. This is because NiFi filters the result based on "data" resource policies granted to that user. - Only results for components which the user has been granted access will be displayed. This is where the /data/{resource}/{uuid} mentioned above comes in to play here.
... View more
07-07-2017
08:22 PM
What is the output of the following:
netstat -ant |grep LISTEN
... View more