Member since
07-13-2017
8
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3119 | 02-22-2018 05:42 PM |
02-23-2018
01:59 PM
Thank you, Ed, for the insight on the ListSFTP processor and for providing some possible workarounds. I think this will give me enough to work with to come up with a solution.
... View more
02-22-2018
06:46 PM
Bump. Does anyone have any ideas surrounding this? I feel like it's a poor design choice to not allow ListSFTP to have upstream connections. It doesn't make sense because almost every field in ListSFTP allows expression language. Why would NiFi not allow us to pass the server information to ListSFTP when it has the capability of expression language? So it seems the only way to pass server information to this processor is via the variable registry in 'nifi.properties'?
... View more
02-22-2018
05:42 PM
We contacted Hortonworks directly and talked with the Nifi developers who confirmed this is a bug in the ValidateCSV processor. No timeline on a fix.
... View more
02-13-2018
05:07 PM
Hello, I'm working on a flow that currently uses FetchSFTP to download a file from an sFTP server every 15 minutes or so. It works, however, if a file is not there (which is valid in our use-case) then the FetchSFTP throws an error. Because of this error, our log monitors think there is an actual error, but in reality it's not an error, it is just how our flow is set up. So to avoid this error from FetchSFTP, I want to use a ListSFTP to check for a file first. However, ListSFTP does not allow upstream connections so I cannot pass the sFTP server information to it. We load this server information from a GetFile upstream and then use an UpdateAttribute to add the information to the flowfile. I don't understand the reasoning for not allowing upstream connections to the ListSFTP processor. Can someone explain why it's designed like this? Also, do you know of another workaround to stop FetchSFTP from throwing an error when it doesn't find a file (since it's just going to be looking again in 15 minutes or so). Thanks for the help. Also, since a picture is worth a thousand words, here is our current flow with the FetchSFTP getting information from the client config file:
... View more
Labels:
- Labels:
-
Apache NiFi
01-17-2018
03:29 PM
I've continued to research this issue further and have been able to replicate the issue with a simplified flow using just a GenerateFlowfile processor running at 30 second intervals and a ValidateCSV. The test data in GenerateFlowFile: 1000000|Firstname1000000|Lastname1000000|2452954106|3387173669|9625048215|test2@bkfs.com|FstName2-1000000|LstName2-1000000|6483154023|2898543994|8006477595|blah@test.com|123|Fake Street||Jacksonville|Fl|32250|1234
1000001|Firstname1000001|Lastname1000001|5227791843|9503374934|4617490963|test2@bkfs.com|FstName2-1000001|LstName2-1000001|2351899318|2250724929|9755971010|blah@test.com|123|Fake Street||Jacksonville|Fl|32250|1234
1000002|Firstname1000002|Lastname1000002|2726022427|8455734201|5061581356|test2@bkfs.com|FstName2-1000002|LstName2-1000002|4889605315|9765983450|5530103619|blah@test.com|123|Fake Street||Jacksonville|Fl|32250|1234
1000003|Firstname1000003|Lastname1000003|6600426098|5527562109|5457567035|test2@bkfs.com|FstName2-1000003|LstName2-1000003|2219888741|2230922868|6707470404|blah@test.com|123|Fake Street||Jacksonville|Fl|32250|1234
1000004|Firstname1000004|Lastname1000004|6786938248|6411937419|5123777198|test2@bkfs.com|FstName2-1000004|LstName2-1000004|2831114709|4010383704|6841344395|blah@test.com|123|Fake Street||Jacksonville|Fl|32250|1234
1000005|Firstname1000005|Lastname1000005|2906459644|4113670124|9732997854|test2@bkfs.com|FstName2-1000005|LstName2-1000005|6029871693|2601194119|5229252727|blah@test.com|123|Fake Street||Jacksonville|Fl|32250|1234
1000006|Firstname1000006|Lastname1000006|6206996436|8377906697|6096568098|test2@bkfs.com|FstName2-1000006|LstName2-1000006|3776671873|7219153330|6267564904|blah@test.com|123|Fake Street||Jacksonville|Fl|32250|1234
1000007|Firstname1000007|Lastname1000007|6592010372|4396981880|3689391047|test2@bkfs.com|FstName2-1000007|LstName2-1000007|5482870564|9124280167|7764088854|blah@test.com|123|Fake Street||Jacksonville|Fl|32250|1234
1000008|Firstname1000008|Lastname1000008|2439048871|3800831154|6990841221|test2@bkfs.com|FstName2-1000008|LstName2-1000008|4823913636|8864891978|4810747004|blah@test.com|123|Fake Street||Jacksonville|Fl|32250|1234
1000009|Firstname1000009|Lastname1000009|6352662441|6724206719|3289556363|test2@bkfs.com|FstName2-1000009|LstName2-1000009|6531169670|7010161669|8092909403|blah@test.com|123|Fake Street||Jacksonville|Fl|32250|1234
etc.. The validation in ValidateCSV: Unique(), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")) The exception after it runs through twice: 2018-01-17 10:15:32,988 DEBUG [Timer-Driven Process Thread-9] o.a.nifi.processors.standard.ValidateCsv ValidateCsv[id=327d1aac-1000-1160-9184-47b5669853c4] Failed to validate StandardFlowFileRecord[uuid=8a93d2e3-d304-44d6-a1f3-1cd38d16560e,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1516201957046-1, container=default, section=1], offset=101225, length=98225],offset=0,name=4399965514884,size=98225] against schema due to org.supercsv.exception.SuperCsvConstraintViolationException: duplicate value '1000000' encountered
processor=org.supercsv.cellprocessor.constraint.Unique
context={lineNo=1, rowNo=1, columnNo=1, rowSource=[1000000, Firstname1000000, Lastname1000000, 2452954106, 3387173669, 9625048215, test2@bkfs.com, FstName2-1000000, LstName2-1000000, 6483154023, 2898543994, 8006477595, blah@test.com, 123, Fake Street, null, Jacksonville, Fl, 32250, 1234]}; routing to 'invalid': {}
org.supercsv.exception.SuperCsvConstraintViolationException: duplicate value '1000000' encountered
at org.supercsv.cellprocessor.constraint.Unique.execute(Unique.java:76)
at org.supercsv.util.Util.executeCellProcessors(Util.java:93)
at org.supercsv.io.AbstractCsvReader.executeProcessors(AbstractCsvReader.java:203)
at org.apache.nifi.processors.standard.ValidateCsv$NifiCsvListReader.read(ValidateCsv.java:617)
at org.apache.nifi.processors.standard.ValidateCsv$1.process(ValidateCsv.java:479)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2136)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2106)
at org.apache.nifi.processors.standard.ValidateCsv.onTrigger(ValidateCsv.java:446)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1120)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Shouldn't ValidateCSV only be validating the Unique() attribute on a single flowfile? Instead, it seems like it keeps the first flowfile in memory, then validates the next flowfile against the first and second. Is this a bug?
... View more
01-16-2018
03:52 PM
I'm having an issue with the NiFi ValidateCSV processor claiming a valid file is invalid. This file will pass through the processor correctly as a valid file on the first attempt, but if I leave the flow running and try to put the same file through again immediately after, it claims that the file is invalid. However, if I stop and start the flow between each file validation, the file will always be valid. It seems like the processor is keeping some of the flowfile content in memory when it tries to evaluate the second pass, causing it to fail. The schema validation is only checking the first column for a unique() entry. It's set up like this: "Unique(), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*")), Optional(StrRegEx(".*"))" See attached for the flow and processor in question. Any ideas? Any help is much appreciated.
... View more
Labels:
- Labels:
-
Apache NiFi