Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How many times does the script used in spark pipes gets executed.?

avatar
Rising Star
I tried the below spark scala code and got the output as mentioned below. I have tried to pass the inputs to script, but it didn't receive and when i used collect the print statement i used in the script appeared twice.

My simple and very basic perl script first:

#!/usr/bin/perl
print("arguments $ARGV[0] \n"); // Just print the arguments.

My Spark code:

object PipesExample {
  def main(args:Array[String]){
    val conf = new SparkConf();

    val sc = new SparkContext(conf);

    val distScript = "/home/srinivas/test.pl"
    sc.addFile(distScript)

    val rdd = sc.parallelize(Array("srini"))

    val piped = rdd.pipe(Seq(SparkFiles.get("test.pl")))

    println(" output " + piped.collect().mkString(" "));

  }
}

Output looked like this..

 output: arguments arguments

1) What mistake i have done to make it fail receiving the arguments.? 2) Why it executed twice.?

If it looks too basic, please apologize me. I was trying to understand to the best and want to clear my doubts.
1 ACCEPTED SOLUTION

avatar
Super Collaborator

How many executors do you have when you run this?

I see the same when I run it because it gets sent to each executor (2 in my case)

 

Wilfred

View solution in original post

1 REPLY 1

avatar
Super Collaborator

How many executors do you have when you run this?

I see the same when I run it because it gets sent to each executor (2 in my case)

 

Wilfred