Member since
04-09-2018
8
Posts
1
Kudos Received
0
Solutions
07-10-2018
01:04 PM
Thank you for your answer, The second hash is performed after a PutHDFS and not a ListHDFS (I have edited my post, sorry for the mistake). If I understand you well, this check is not useful because PutHDFS and FetchFile processors are already able to catch corruption errors reliably? I would compare both hashes with NiFi expression language (:equals function) inside a RouteOnAttribute processor.
... View more
07-10-2018
07:40 AM
Hi,
I work on a NiFi flow getting data from a FTP server, sending this data to HDFS. I have to add to this flow the ability of checking data integrity between files fetched on the FTP, and files written on HDFS.
To do this, I use the HashContent NiFi processor with MD5 algorithm to compute MD5 hash of flowfiles from start and end of the flow (I can get MD5 hash of each file on FTP, second MD5 is computed after a PutHDFS retrieving files having been written).
Finally, I compare both values and consider data integrity is OK if they are equal.
Do you have a general advice about this practice?
Is this kind of check really useful with NiFi?
Thanks,
Benjamin
... View more
Labels:
- Labels:
-
Apache NiFi
05-04-2018
09:58 AM
I have a NiFi job starting with a GetFile processor configured to run on primary node to avoid duplicate flowfiles.
After a NiFi node restart, causing a Primary node re-election, the GetFile processor has created two flowfiles (one expected and one duplicate).
I suppose that the GetFile was configured to run with the pre-election primary node and post-election one too, causing those two flowfiles (both flowfiles were processed by different nodes according to NiFi data provenance).
Is there a way to avoid this behavior, and is it a NiFi bug ?
NB: While the job was running, coordinator and primary nodes were the same.
Thanks a lot.
... View more
Labels:
- Labels:
-
Apache NiFi