Created 07-15-2016 06:07 PM
I have a simple flow of GetFile -> PutHDFS. The flow works when KeepSourceFile=true. I then turn the processors off, empty the target directory in hdfs, reconfigure GetFile identically except KeepSourceFile=false and turn them back on. The files are in their local source directory with full 777 privs but never get read by GetFile. Scheduler for each processor is Timer at 10 s. This is running nifi installed on the sandbox. Any ideas on why it is not working?
Created 07-15-2016 06:36 PM
Hello
Please take a look in the logs/nifi-app.log. There should be errors. Sounds like it might not be able to delete the files (perms on the directory itself perhaps). If nothing interesting in the logs try updating your conf/logback.xml by adding this line in with other similar looking lines
<logger name="org.apache.nifi.processors.standard.GetFile" level="DEBUG"/>
Thanks Joe
Created 07-15-2016 06:38 PM
@gkeys it may be helpful to enable DEBUG-level logging, configured in $NIFI_HOME/conf/logback.xml.
In that file, the fully-qualified class name for the processor for which logging should be enabled can be specified. For example,
<logger name="org.apache.nifi.processors.standard.GetFile" level="DEBUG"/>
would enable DEBUG level logging for every GetFile in your flow.
Created 07-15-2016 08:00 PM
when config is KeepSourceFile=false I get
2016-07-15 19:49:47,338 INFO [StandardProcessScheduler Thread-6] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled GetFile[id=fdb1f403-f4df-446d-bf81-6732f02fc909] to run with 1 threads 2016-07-15 19:49:47,339 DEBUG [Timer-Driven Process Thread-9] o.a.nifi.processors.standard.GetFile GetFile[id=fdb1f403-f4df-446d-bf81-6732f02fc909] has chosen to yield its resources; will not be scheduled to run again for 10 seconds 2016-07-15 19:50:07,341 DEBUG [Timer-Driven Process Thread-6] o.a.nifi.processors.standard.GetFile GetFile[id=fdb1f403-f4df-446d-bf81-6732f02fc909] has chosen to yield its resources; will not be scheduled to run again for 10 seconds 2016-07-15 19:50:17,342 DEBUG [Timer-Driven Process Thread-9] o.a.nifi.processors.standard.GetFile GetFile[id=fdb1f403-f4df-446d-bf81-6732f02fc909] has chosen to yield its resources; will not be scheduled to run again for 10 seconds
when it is true I get
2016-07-15 19:52:44,497 INFO [StandardProcessScheduler Thread-4] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled GetFile[id=fdb1f403-f4df-446d-bf81-6732f02fc909] to run with 1 threads 2016-07-15 19:52:44,504 INFO [Timer-Driven Process Thread-6] o.a.nifi.processors.standard.GetFile GetFile[id=fdb1f403-f4df-446d-bf81-6732f02fc909] added StandardFlowFileRecord[uuid=7da0d589-4a97-4d93-9ecb-a5d22c7d520c,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1468610929062-156, container=default, section=156], offset=302824, length=152756],offset=0,name=20160708-233120.tsv,size=152756] to flow 2016-07-15 19:52:44,506 INFO [Timer-Driven Process Thread-6] o.a.nifi.processors.standard.GetFile GetFile[id=fdb1f403-f4df-446d-bf81-6732f02fc909] added StandardFlowFileRecord[uuid=b59f416c-7cc8-4529-adb7-92612f6e5f6e,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1468610929062-156, container=default, section=156], offset=455580, length=150068],offset=0,name=20160709-092708.tsv,size=150068] to flow 2016-07-15 19:52:54,511 INFO [Timer-Driven Process Thread-1] o.a.nifi.processors.standard.GetFile GetFile[id=fdb1f403-f4df-446d-bf81-6732f02fc909] added StandardFlowFileRecord[uuid=186c9e29-b80a-48d2-8747-94a74cfd2aee,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1468610929062-156, container=default, section=156], offset=605648, length=152756],offset=0,name=20160708-233120.tsv,size=152756] to flow 2016-07-15 19:52:54,512 INFO [Timer-Driven Process Thread-1] o.a.nifi.processors.standard.GetFile GetFile[id=fdb1f403-f4df-446d-bf81-6732f02fc909] added StandardFlowFileRecord[uuid=5b6683ec-f7eb-4724-a632-6abf09ae57a9,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1468610929062-156, container=default, section=156], offset=758404, length=150068],offset=0,name=20160709-092708.tsv,size=150068] to flow 2016-07-15 19:53:04,514 INFO [Timer-Driven Process Thread-4] o.a.nifi.processors.standard.GetFile GetFile[id=fdb1f403-f4df-446d-bf81-6732f02fc909] added StandardFlowFileRecord[uuid=8332e96a-c66a-42eb-a2f1-f6edaf88362c,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1468610929062-156, container=default, section=156], offset=908472, length=152756],offset=0,name=20160708-233120.tsv,size=152756] to flow 2016-07-15 19:53:04,515 INFO [Timer-Driven Process Thread-4] o.a.nifi.processors.standard.GetFile GetFile[id=fdb1f403-f4df-446d-bf81-6732f02fc909] added StandardFlowFileRecord[uuid=47a9cdbf-3820-40af-93d0-f12108b37010,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1468612384514-157, container=default, section=157], offset=0, length=150068],offset=0,name=20160709-092708.tsv,size=150068] to flow
The what makes sense. Thoughts on the why?
Created 07-16-2016 02:11 PM
Not sure just yet. Will take a look. The only time GetFile would yield, as is the case in the log output you show for keepFile=true, is when it finds nothing in the listing.
Created 07-18-2016 01:38 PM
Just to be clear -- the only change is the keepFile flag. Files are there and GetFile points to them identically in both cases.
Created 07-18-2016 01:38 PM
Also, this is on sandbox
Created 07-16-2016 02:27 PM
GetFile when told to keep source files where it finds them will capture them even if it doesn't have write permissions to the directory they are contained in. However, when told to remove source files once pulled it requires write permissions to the directory it is pulling from and when listing it will skip those which it doesn't have permissions for. Given that we know there are files there and it isn't pulling them in this case and specifically yielding, which only happens when the listing attempt provides no valid results, then I strongly believe the parent directory permissions are not sufficient. Please verify.
Created 07-18-2016 10:39 PM
What are the permissions on both the file(s) you are trying to pickup with the GetFile processor and the permissions on the directory the file(s) live in?
-rwxrwxrwx 1 nifi dataflow 24B Jul 18 18:20 testfile
and
drwxr-xr-- 3 root dataflow 102B Jul 18 18:20 testdata
With the above example permission, I reproduce exactly what you are seeing. If "Keep Source File" is set to true, NiFi creates a new flowfile with the content of the file. If "Keep Source File" is set to false, NiFi GetFile yields because it does not have the necessary permissions to delete the file from the directory. This is because the write bit is required on the source directory for the user who is trying to delete the file(s). In my example nifi is running as user nifi, so he can read the files in the root owned testdata directory because the directory group ownership is dataflow just like my nifi user and the dir has r-x permissions. fi i change that dir permissions to rwx then my nifi user will also be able to delete the testfile.
Thanks, Matt