Member since
03-26-2024
25
Posts
17
Kudos Received
0
Solutions
04-26-2024
03:04 AM
1 Kudo
Here is the configuration for my MergeContent Processor
... View more
04-25-2024
11:01 AM
I am fetching files from a particular HDFS path and using a MergeContent processor to merge all the fetched files. Then, I transfer them to an SFTP server using a PutSFTP Processor. There are currently 20 files present in the path, with a total file size of 1.2 GB (This may vary in my production environment, ranging around ~300GB). Initially, my MergeContent processor handled 1GB of file size (14 out of 20 files), merging and transferring to the SFTP server. Later, it picked up the remaining 0.2GB of files(the remaining 6 files) and transferred another file to my SFTP server. I updated the queue limit size to 2GB for the MergeContent processor incoming connection, and then it merged all 20 files and copy a 1.2GB file at once. In another flow, I have a FetchHDFS-->PutSFTP processor, which copies a single file with a size exceeding 100GB to the SFTP server. The Back Pressure Size Threshold is set to 1GB, and it's working. I am wondering why it is not working in the MergeContent Processors Could you please advise on the appropriate configuration settings for the MergeContent processor? Every day, my total file size may vary from 10 GB to 300GB.
... View more
Labels:
- Labels:
-
Apache NiFi
04-23-2024
07:56 AM
Thank you, @MattWho for providing timely responses and quick solutions to the queries. You are really helping the community grow. Hats off to you. Appreciate it
... View more
04-22-2024
11:56 PM
1 Kudo
Hi @MattWho Apologies! I realized that the failed FlowFiles have the same 'filename' attribute as another FlowFile that has already been transferred to the target SFTP server. There are 20 files available in my HDFS path, and once the file is fetched, I am updating the 'filename' attribute using the below expression. However, one or two files are getting the same 'filename,' which is causing the error 'failed to rename dot-file': ${ExtractName}_${now():format('yyyyMMddHHmmssSSS')}.txt Thank you for providing inputs to identify the root cause of this issue. Now, could you please suggest the best approach to avoid this scenario, as even with milliseconds, it is getting the same filename? As you suggested earlier, can we go with "Run Duration" setting for putSFTP processor Thank you
... View more
04-22-2024
11:49 AM
Hi @MattWho Thanks again for your suggestions The failures always occur during the renaming of dot files. There is no process consuming the file once it is placed on the SFTP server, and there is no chance that another process is consuming the dot files. None of the queued FlowFiles have the same 'filename' attribute as another FlowFile or a file already present on the target SFTP server. Unfortunately, we don't have permission to view the SFTP logs on our Linux server. I will connect with the admin team to obtain sample logs. PFB my putSFTP Processor configuration
... View more
04-18-2024
12:01 PM
1 Kudo
Hi @MattWho Thank you for the great suggestions and super helpful information. Here are the results of what I tried: I set the Run Schedule to 0 seconds and stopped the PutSFTP processor. After all 20 flowfiles were queued up, I started it again. Result: Out of 20 flow files, 1 failed. I set the Run Schedule to 0 seconds and let the flow run with all processors started (Here also, all 20 flow files came almost same time) Result: Out of 20 flow files, 2 failed. I updated the Run Schedule of the PutSFTP processor from 0 seconds to 30 seconds. Result: No failures, all 20 flow files passed. I updated the "Run Duration" to 500ms. Result: No failures, all 20 flow files passed. Could you please suggest the best approach to address this scenario? Option 3 or 4
... View more
04-18-2024
02:25 AM
1 Kudo
Hi @TimothySpann If we introduce any "Run Schedule" delay for the putSFTP processor will it help to fix this issue ? I mean to change the "Run Schedule" from 0 seconds to 30 seconds or something? There is no network delay and regarding the RAM size, we are yet to hear back from the platform team.
... View more
04-15-2024
04:40 AM
1 Kudo
Hi @TimothySpann Please find below the requested info. Operating system: Linux Java version : Nifi Server: openjdk version "11.0.22" 2024-01-16 LTS OpenJDK Runtime Environment (Red_Hat-11.0.22.0.7-1) (build 11.0.22+7-LTS) OpenJDK 64-Bit Server VM (Red_Hat-11.0.22.0.7-1) (build 11.0.22+7-LTS, mixed mode, sharing) SFTP Server: openjdk version "1.8.0_402" OpenJDK Runtime Environment (build 1.8.0_402-b06) OpenJDK 64-Bit Server VM (build 25.402-b06, mixed mode) NiFi version: Cloudera Flow Management (CFM) 2.1.5.1027 1.18.0.2.1.5.1027-2 built 02/09/2023 22:16:12 CST Tagged nifi-1.18.0-RC4 Powered by Apache NiFi 1.18.0 File system type: HDFS Sftp server version : OpenSSH_7.4p1, OpenSSL 1.0.2k-fips Type and size of the HDFS files: I am trying to transfer files ranging in size from 300KB to 800KB. Typically, my HDFS path contains a total of 20 files. In production, the file sizes may vary from 300MB to 600MB, and the total file count would still be 20 And I am running on a Nifi Clustor I am intermittently facing this failure 'Failed to rename dot-file.' for one or two files while preforming PutSFTP The first PutSFTP processor is used to transfer the actual file and the second one is used to transfer the stats file corresponding to that file like file name, size, row count etc. I can limit the second PutSFTP processor to transfer it once with all the 20 files details. ie, Transfer one stats file with the details of all 20 files. Can we store like this info in a variable line by line and then send at the end ? FileName~RowCount~FileSize file1~100~1250 file2~200~3000 The above will also satisfy my requirement instead of multiple stats files for the second PutSFTP processor. Could you pleae some inputs on this issue. Thank you
... View more
04-10-2024
02:48 AM
1 Kudo
My requirement is to retrieve the total number of files in a given HDFS directory and based on the number of files proceed with the downstream flow I cannot use the ListHDFS processor as it does not allow inbound connections. The GetHDFSFileInfo processor generates flowfiles for each HDFS file, causing all downstream processors to execute the same number of times. I have observed that we can use ExecuteStreamCommand to invoke a script and execute HDFS commands to get the number of files. I would like to know if we can obtain the count without using a script? Or if there is any other option available besides the above.
... View more
Labels:
- Labels:
-
Apache NiFi
-
HDFS
04-09-2024
04:33 AM
1 Kudo
1) Initially, I faced the "NiFi PutSFTP failed to rename dot file issue" only when the child processor was configured with "Outbound Policy = Batch Output". It worked without the child processor group. 2) I modified the PutSFTP failure retry attempt to 3, and it fixed the issue. 3) Later, I introduced a RouteOnAttribute after the FetchHDFS processor for some internal logic implementation, and the PutSFTP error started again. 4) This time, I updated the "Run Schedule" of the PutSFTP processor from 0Sec to 3 Sec, and it again fixed the issue. 5) I have a requirement to transfer stats of each file (with file name, row count, file size) etc. So, I introduced one more PutSFTP processor, and the issue popped up again. 6) Finally, I made the following changes to both of my PutSFTP processors: a) Added PutSFTP failure retry attempt to 3. b) Modified the "Run Schedule" of the first PutSFTP Processor to "7 Sec". c) Modified the "Run Schedule" of the second PutSFTP Processor to "10 Sec". Now it is working fine. Are we getting this issue because of 20 flowfiles processing at a time ? Could you please suggest if this is the right way to fix the "NiFi PutSFTP failed to rename dot file issue"?
... View more