Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Running a Shell script on Hadoop cluster (WITHOUT Using HDFS)

New Contributor

Hi, I have a Large file (with 1 Million records) on Linux. This file is not yet copied to HDFS. It is still on ext3 file system of Linux. I would like to perform some kind of validation on this file (eg. Data type validation - First 9 bytes of each record should have digits). I have written a shell script to perform the validation.

My question is, Is it possible to run the shell script on Hadoop's Infrastruture ? that is, Can I execute the script without putting the file onto HDFS, but still using Hadoop's Multi node infrastructure ?




Yes, you can. SCP the file to any of the data nodes and then run your shell script to validate it. The file will be available in HDFS only when you use the "hadoop fs -put" command and put it into the HDFS.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.