Hi, I have a Large file (with 1 Million records) on Linux. This file is not yet copied to HDFS. It is still on ext3 file system of Linux. I would like to perform some kind of validation on this file (eg. Data type validation - First 9 bytes of each record should have digits). I have written a shell script to perform the validation.
My question is, Is it possible to run the shell script on Hadoop's Infrastruture ? that is, Can I execute the script without putting the file onto HDFS, but still using Hadoop's Multi node infrastructure ?
Thanks.