Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to hash many files?

Highlighted

How to hash many files?

New Contributor

I have a ton of files in multiple sub directories and I need to hash them and put the hashes in a database. How can I accomplish this as fast as possible?

I was able to hash string in Pig using the DataFu extension. However, this simply splits the files into lines and then hashes those lines.

http://datafu.apache.org/docs/datafu/guide/hashing.html

Perhaps map reduce is not the way, or the files need to be stored in a contaner?