Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Nifi PutSFTP failed to rename dot file when "Outbound Policy = Batch Output" for a child processor group

avatar
Rising Star

My data flow starts from a single FlowFile produced by a Sqoop job, which then expands into multiple FlowFiles after executing the GetHDFSFileInfo Processor (based on the number of HDFS files). To capture all failure scenarios, I have created a Child processor group with the following processors

Input Port --> GetHDFSFileInfo --> RouteOnAttribute --> UpdateAttribute --> FetchHDFS --> PutSFTP --> ModifyBytes --> Output Port


Main Processor Group
--------------------
RouteOnAttribute --> Above mentioned Child Processor Group --> MergeContent --> downstream processors

The child processor group is configured with "FlowFile Concurrency = Single FlowFile Per Node" and "Outbound Policy = Batch Output" to ensure that all fetched FlowFiles are successfully processed (written to the SFTP server).

My GetHDFSFileInfo processor returns 20 HDFS files, and each execution successfully transfers 18 to 19 files to my SFTP server. However, during each execution, one or two file transfers fail in the PutSFTP Processor with the error message 'Failed to rename dot-file.' An error screenshot is attached below

Capture.JPG

I am facing this issue only when the child processor is configured with "Outbound Policy = Batch Output".
If we try without child processor group, then also it is working.

Could you please help to fix the issue with putSFTP processor.

This is in continuation with the solution provided in the thread https://community.cloudera.com/t5/Support-Questions/How-to-convert-merge-Many-flow-files-to-single-f...

 

1 ACCEPTED SOLUTION

avatar
Super Mentor

@s198 

The two most common scenarios for this type of failure are:
1. File already exists with same name when trying to rename.  Typically resolved by using an update attribute when a failure exists to modify the filename.  Perhaps use the nextInt NiFi expression Language function to add an incremental number to filename or in your case modify the time by adding a few milliseconds to it.
2. Some process is consuming the dot (.) filename before the putSFTP processor has renamed it.  This requires modifying the downstream process to ignore dot files.

While it is great that run duration and run schedule increases appear to resolve this issue, I think you are dealing will a millisecond raise condition and these two options will not always guarantee success here.  Best option is to programmatically deal with the failures with a filename attribute modification or change who you are uniquely naming your files if possible.

Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt



View solution in original post

16 REPLIES 16

avatar
Rising Star
1) Initially, I faced the "NiFi PutSFTP failed to rename dot file issue" only when the child processor was configured with "Outbound Policy = Batch Output". It worked without the child processor group.
2) I modified the PutSFTP failure retry attempt to 3, and it fixed the issue.
3) Later, I introduced a RouteOnAttribute after the FetchHDFS processor for some internal logic implementation, and the PutSFTP error started again. 
4) This time, I updated the "Run Schedule" of the PutSFTP processor from 0Sec to 3 Sec, and it again fixed the issue. 
5) I have a requirement to transfer stats of each file (with file name, row count, file size) etc. So, I introduced one more PutSFTP processor, and the issue popped up again. 
6) Finally, I made the following changes to both of my PutSFTP processors:
       a) Added PutSFTP failure retry attempt to 3.
       b) Modified the "Run Schedule" of the first PutSFTP Processor to "7 Sec".
       c) Modified the "Run Schedule" of the second PutSFTP Processor to "10 Sec".
 
Now it is working fine. Are we getting this issue because of 20 flowfiles processing at a time ? Could you please suggest if this is the right way to fix the "NiFi PutSFTP failed to rename dot file issue"?

avatar
Master Guru

It looks like a timing or multiple components accessing the same sftp server.  sFTP is very slow and nonthreaded and timing of accessing them can be tricky.

I recommend setting only 1 PutSFTP instance (dont run on cluster and don't have more than one post at once).

We need to see logs and know operating system, java version, NiFi version and some error details.   File system type ,  sftp server version and the type and size of the HDFS files.

avatar
Community Manager

@TimothySpann @steven-matison @SAMSAL , would you be able to help @s198 please? 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Rising Star

Hi @TimothySpann 

 
Please find below the requested info.
 
Operating system: 
                Linux
Java version : 
            Nifi Server:
                      openjdk version "11.0.22" 2024-01-16 LTS
                      OpenJDK Runtime Environment (Red_Hat-11.0.22.0.7-1) (build 11.0.22+7-LTS)
                     OpenJDK 64-Bit Server VM (Red_Hat-11.0.22.0.7-1) (build 11.0.22+7-LTS, mixed mode, sharing)
 
            SFTP Server:
                     openjdk version "1.8.0_402"
                        OpenJDK Runtime Environment (build 1.8.0_402-b06)
                     OpenJDK 64-Bit Server VM (build 25.402-b06, mixed mode)
 
NiFi version:   
                      Cloudera Flow Management (CFM) 2.1.5.1027
                       1.18.0.2.1.5.1027-2 built 02/09/2023 22:16:12 CST
                      Tagged nifi-1.18.0-RC4
                      Powered by Apache NiFi 1.18.0
File system type:
                     HDFS 
 
Sftp server version : 
                 OpenSSH_7.4p1, OpenSSL 1.0.2k-fips
 
Type and size of the HDFS files:
                     I am trying to transfer files ranging in size from 300KB to 800KB. Typically, my HDFS path contains a total of 20 files. In production, the file sizes may vary from 300MB to 600MB, and the total file count would still be 20
And I am running on a Nifi Clustor 
 
I am intermittently facing this failure  'Failed to rename dot-file.' for one or two files while preforming PutSFTP 
 

The first PutSFTP processor is used to transfer the actual file and the second one is used to transfer the stats file corresponding to that file like file name, size, row count etc. 

I can limit the second  PutSFTP  processor to transfer it once with all the 20 files details. ie, Transfer one stats file with the details of all 20 files. Can we store like this info in a variable line by line and then send at the end ? 
FileName~RowCount~FileSize
file1~100~1250
file2~200~3000
The above will also satisfy my requirement instead of multiple stats files for the second PutSFTP processor.

Could you pleae some inputs on this issue. 
Thank you

avatar
Master Guru

How much RAM on the machine and how much is dedicated in the NiFi configuration (or Cloudera manager)?   Should be at least  8GB of RAM.

Either the network is slow or you don't have enough RAM.

You can build files from the stats very easily.   That is a good strategy.

avatar
Rising Star

Hi @TimothySpann 

If we introduce any "Run Schedule" delay for the putSFTP processor will it help to fix this issue ? I mean to change the  "Run Schedule" from 0 seconds to 30 seconds or something?

There is no network delay and regarding the RAM size, we are yet to hear back from the platform team.

avatar
Master Guru

delay could help

avatar
Super Mentor

@s198 

Question for you:
- If you leave the putSFTP processor stopped, run your dataflow so all FlowFiles queue in front of the putSFTP processor and then start the putSFTP processor, does the issue still happen?     
- Does issue only happen when the flow is an all started/running state?

Answers to above can help in determining if changing the run schedule will help here.

Run Schedule details:
- The run schedule works in conjunction with Timer Driven scheduling strategy.  This schedule setting controls how often a component will get scheduled to execute (different from when it actually executes.  Execution depends on available threads in the NiFi Timer Driven thread pool shared by all components). By default this is set to 0 secs which means that NiFi should schedule his processor as often as possible (Basically schedule it again as soon as an available concurrent task (concurrent tasks default is 1) is available to it.  To avoid CPU saturation here, NiFi builds in a yield duration if upon scheduling of a processor there is no work to be done (inbound connection(s) are empty).   

Depending on load on yoru system and dataflow, speed of network, this could happen very quick meaning it scheduled, sees only one FlowFile in the inbound connection at time of schedule and processes only that one FlowFile instead of a batch.  It then closes that thread and starts a new one for next FlowFile instead of processing multiple FlowFiles in one SFTP connection.  By changing run schedule you are allowing more time between scheduling for FlowFiles to queue on the inbound connection so they get batch processed in a single SFTP connection.

Run Duration details:
Another option on processors is the run duration setting.  What this adjustment does is upon scheduling of a processor the execution will not end until the configured run duration has elapsed.  So lets say at time of scheduling (run schedule) there is one FlowFile in inbound connection queue (remember we are dealing with micro seconds here, so not something you can visualize yourself via the UI).  That execution thread will execute against that FlowFile, but rather then close out the session immediately committing the FlowFile to an outbound relationship, it will check inbound connection for another FlowFile and process it in same session.  It will continue to do this until the run duration is satisfied at which time all processed FlowFiles during that execution are committed to downstream relationship(s).  So Run Duration might be another setting you try to see if it helps with your issue.  If you try run duration, i'd set run schedule to default.

You may also want to look at your SFTP server logs to see what is happening when the file rename attempts are failing.

Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

avatar
Rising Star

Hi @MattWho 

Thank you for the great suggestions and super helpful information.

Here are the results of what I tried:

  1. I set the Run Schedule to 0 seconds and stopped the PutSFTP processor. After all 20 flowfiles were queued up, I started it again.
    Result: Out of 20 flow files, 1 failed.
  2.  I set the Run Schedule to 0 seconds and let the flow run with all processors started (Here also, all 20 flow files came almost same time)
    Result: Out of 20 flow files, 2 failed.
  3. I updated the Run Schedule of the PutSFTP processor from 0 seconds to 30 seconds.
    Result: No failures, all 20 flow files passed.
  4. I updated the "Run Duration" to 500ms.
    Result: No failures, all 20 flow files passed.

Could you please suggest the best approach to address this scenario? Option 3 or 4