Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to perform a reliable check of data integrity with NiFi ?

avatar

Hi,

I work on a NiFi flow getting data from a FTP server, sending this data to HDFS. I have to add to this flow the ability of checking data integrity between files fetched on the FTP, and files written on HDFS.

To do this, I use the HashContent NiFi processor with MD5 algorithm to compute MD5 hash of flowfiles from start and end of the flow (I can get MD5 hash of each file on FTP, second MD5 is computed after a PutHDFS retrieving files having been written).

Finally, I compare both values and consider data integrity is OK if they are equal.

Do you have a general advice about this practice?

Is this kind of check really useful with NiFi?

Thanks,

Benjamin

1 ACCEPTED SOLUTION

avatar
Super Mentor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
2 REPLIES 2

avatar
Super Mentor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

Thank you for your answer,

The second hash is performed after a PutHDFS and not a ListHDFS (I have edited my post, sorry for the mistake).

If I understand you well, this check is not useful because PutHDFS and FetchFile processors are already able to catch corruption errors reliably?

I would compare both hashes with NiFi expression language (:equals function) inside a RouteOnAttribute processor.