Support Questions
Find answers, ask questions, and share your expertise

Should HDP data nodes have Spark client libraries installed?

Explorer

In working with a particular HDP 3.1 cluster, with Spark 2.3 installed, I am finding that the Spark client libraries (ex: spark-cli command, as well as libraries under jars) are not available on every node.  They are only installed on the nodes the customer refers to as "client nodes" (I believe this is analogous to "edge nodes").  They also have data nodes in the cluster, which are able to run Spark executors (and, in fact, YARN does distribute tasks to executors on them), but those nodes do not have Spark client libraries installed.

 

Is this a normal setup?  Can I not assume that the Spark client is installed on every node, even if it is generally available on the cluster?  Thanks for any insight.

1 ACCEPTED SOLUTION

Super Collaborator

@JeffEvans  I think below thread answers the same question about spark client libs on worker nodes. 

 

https://community.cloudera.com/t5/Support-Questions/Spark-on-Yarn-Do-nodes-need-Spark-installed/td-p...

 

We dont need spark clients installed on all the worker nodes, should be installed only on edge nodes.

View solution in original post

1 REPLY 1

Super Collaborator

@JeffEvans  I think below thread answers the same question about spark client libs on worker nodes. 

 

https://community.cloudera.com/t5/Support-Questions/Spark-on-Yarn-Do-nodes-need-Spark-installed/td-p...

 

We dont need spark clients installed on all the worker nodes, should be installed only on edge nodes.

; ;