Support Questions

Find answers, ask questions, and share your expertise

How many times does the script used in spark pipes gets executed.?

avatar
Rising Star
I tried the below spark scala code and got the output as mentioned below. I have tried to pass the inputs to script, but it didn't receive and when i used collect the print statement i used in the script appeared twice.

My simple and very basic perl script first:

#!/usr/bin/perl
print("arguments $ARGV[0] \n"); // Just print the arguments.

My Spark code:

object PipesExample {
  def main(args:Array[String]){
    val conf = new SparkConf();

    val sc = new SparkContext(conf);

    val distScript = "/home/srinivas/test.pl"
    sc.addFile(distScript)

    val rdd = sc.parallelize(Array("srini"))

    val piped = rdd.pipe(Seq(SparkFiles.get("test.pl")))

    println(" output " + piped.collect().mkString(" "));

  }
}

Output looked like this..

 output: arguments arguments

1) What mistake i have done to make it fail receiving the arguments.? 2) Why it executed twice.?

If it looks too basic, please apologize me. I was trying to understand to the best and want to clear my doubts.
1 ACCEPTED SOLUTION

avatar
Super Collaborator

How many executors do you have when you run this?

I see the same when I run it because it gets sent to each executor (2 in my case)

 

Wilfred

View solution in original post

1 REPLY 1

avatar
Super Collaborator

How many executors do you have when you run this?

I see the same when I run it because it gets sent to each executor (2 in my case)

 

Wilfred