Support Questions
Find answers, ask questions, and share your expertise

How to hash many files?

How to hash many files?

New Contributor

I have a ton of files in multiple sub directories and I need to hash them and put the hashes in a database. How can I accomplish this as fast as possible?

I was able to hash string in Pig using the DataFu extension. However, this simply splits the files into lines and then hashes those lines.

http://datafu.apache.org/docs/datafu/guide/hashing.html

Perhaps map reduce is not the way, or the files need to be stored in a contaner?