I have a ton of files in multiple sub directories and I need to hash them and put the hashes in a database. How can I accomplish this as fast as possible?
I was able to hash string in Pig using the DataFu extension. However, this simply splits the files into lines and then hashes those lines.
Perhaps map reduce is not the way, or the files need to be stored in a contaner?