Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Running a Shell script on Hadoop cluster (WITHOUT Using HDFS)

Running a Shell script on Hadoop cluster (WITHOUT Using HDFS)

New Contributor

Hi, I have a Large file (with 1 Million records) on Linux. This file is not yet copied to HDFS. It is still on ext3 file system of Linux. I would like to perform some kind of validation on this file (eg. Data type validation - First 9 bytes of each record should have digits). I have written a shell script to perform the validation.

My question is, Is it possible to run the shell script on Hadoop's Infrastruture ? that is, Can I execute the script without putting the file onto HDFS, but still using Hadoop's Multi node infrastructure ?

Thanks.

1 REPLY 1
Highlighted

Re: Running a Shell script on Hadoop cluster (WITHOUT Using HDFS)

Contributor

Yes, you can. SCP the file to any of the data nodes and then run your shell script to validate it. The file will be available in HDFS only when you use the "hadoop fs -put" command and put it into the HDFS.

Don't have an account?
Coming from Hortonworks? Activate your account here