Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How many times does the script used in spark pipes gets executed.?

avatar
Rising Star
I tried the below spark scala code and got the output as mentioned below. I have tried to pass the inputs to script, but it didn't receive and when i used collect the print statement i used in the script appeared twice.

My simple and very basic perl script first:

#!/usr/bin/perl
print("arguments $ARGV[0] \n"); // Just print the arguments.

My Spark code:

object PipesExample {
  def main(args:Array[String]){
    val conf = new SparkConf();

    val sc = new SparkContext(conf);

    val distScript = "/home/srinivas/test.pl"
    sc.addFile(distScript)

    val rdd = sc.parallelize(Array("srini"))

    val piped = rdd.pipe(Seq(SparkFiles.get("test.pl")))

    println(" output " + piped.collect().mkString(" "));

  }
}

Output looked like this..

 output: arguments arguments

1) What mistake i have done to make it fail receiving the arguments.? 2) Why it executed twice.?

If it looks too basic, please apologize me. I was trying to understand to the best and want to clear my doubts.
1 ACCEPTED SOLUTION

avatar
Super Collaborator

How many executors do you have when you run this?

I see the same when I run it because it gets sent to each executor (2 in my case)

 

Wilfred

View solution in original post

1 REPLY 1

avatar
Super Collaborator

How many executors do you have when you run this?

I see the same when I run it because it gets sent to each executor (2 in my case)

 

Wilfred