Support Questions

ask_bill_brooks · ‎11-01-2019

In working with a particular HDP 3.1 cluster, with Spark 2.3 installed, I am finding that the Spark client libraries (ex: spark-cli command, as well as libraries under jars) are not available on every node. They are only installed on the nodes the customer refers to as "client nodes" (I believe this is analogous to "edge nodes"). They also have data nodes in the cluster, which are able to run Spark executors (and, in fact, YARN does distribute tasks to executors on them), but those nodes do not have Spark client libraries installed.

Is this a normal setup? Can I not assume that the Spark client is installed on every node, even if it is generally available on the cluster? Thanks for any insight.

rguruvannagari · ‎11-01-2019

@JeffEvans I think below thread answers the same question about spark client libs on worker nodes.

https://community.cloudera.com/t5/Support-Questions/Spark-on-Yarn-Do-nodes-need-Spark-installed/td-p...

We dont need spark clients installed on all the worker nodes, should be installed only on edge nodes.

View solution in original post

rguruvannagari · ‎11-01-2019

@JeffEvans I think below thread answers the same question about spark client libs on worker nodes.

https://community.cloudera.com/t5/Support-Questions/Spark-on-Yarn-Do-nodes-need-Spark-installed/td-p...

We dont need spark clients installed on all the worker nodes, should be installed only on edge nodes.

Cloudera Community

Support Questions

Should HDP data nodes have Spark client libraries installed?