I have many files and I need to get a list of hashes and metadata for the files, so that they can be compared with a whitelist. Can I do this using pig or hive or do I have to "preprocess" and grab this info before putting it into Hadoop? What is the best way to do this?