I tried the below spark scala code and got the output as mentioned below. I have tried to pass the inputs to script, but it didn't receive and when i used collect the print statement i used in the script appeared twice.
My simple and very basic perl script first:
#!/usr/bin/perl
print("arguments $ARGV[0] \n"); // Just print the arguments.
My Spark code:
object PipesExample {
def main(args:Array[String]){
val conf = new SparkConf();
val sc = new SparkContext(conf);
val distScript = "/home/srinivas/test.pl"
sc.addFile(distScript)
val rdd = sc.parallelize(Array("srini"))
val piped = rdd.pipe(Seq(SparkFiles.get("test.pl")))
println(" output " + piped.collect().mkString(" "));
}
}
Output looked like this..
output: arguments arguments
1) What mistake i have done to make it fail receiving the arguments.? 2) Why it executed twice.?
If it looks too basic, please apologize me. I was trying to understand to the best and want to clear my doubts.