Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark on Yarn: Do nodes need Spark installed?

avatar
Expert Contributor

My team needs Spark v2.3 for new features. We have HDP 2.6.3 installed which has Spark 2.0 (Correction:2.2.0) within stack.

Is that enough to comply such version requirement if I use a docker container as Spark Driver which has Spark 2.3 and configure it so as to use Yarn of current HDP installation?

Or do i need all workers Spark 2.3 installed?

The thing I need to understand is does workers (or nodemanagers) need new Spark libraries once job is submitted to Yarn?

Following note in Spark Cluster overview page led me to think it may not be mandatory: "The user's jar should never include Hadoop or Spark libraries, however, these will be added at runtime."

Thanks in advance...

1 ACCEPTED SOLUTION

avatar
Super Collaborator
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
3 REPLIES 3

avatar
Super Collaborator
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

@Sedat Kestepe

The supported spark versions with HDP 2.6.3 are spark 2.2.0/1.6.3. Other versions may or may not work, and we definitely don't recommend using other versions especially in production environments.

Spark client does not need to be installed in all the cluster worker nodes, only on the edge nodes that submit the application to the cluster.

As far as jar files and whether those are included or not in your application. I agree with above statement, you should avoid adding hadoop/spark library jars to your application as good practice to avoid version mismatch issues.

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

avatar
Expert Contributor

Thanks for your answer and also for the warning about the version in the stack. Current Spark2 version is 2.2.0. I am going to correct it on question.

And also both answers are good news to me. Thanks again.