Support Questions
Find answers, ask questions, and share your expertise

dynamic spark.driver.host allocation

dynamic spark.driver.host allocation

Explorer

Is it possible; via ambari's configuration itself, to specify the list of drivers to possibly use?

End goal here would be to only allow a specific pool of hosts to be utilized as drivers as we have dynamically allocated workers that get torn down after processing.

Hoping to avoid having to manage this in job source code repos (e.g. spark.driver.host for sc config) and have it as part of the blueprint/ambari configuration.

For example we have 200+ static hosts, but at night and during various times we've automated the addition of temporary workers. We'd like to keep the driver landing on randomly chosen workers to avoid it ending up on a temporary worker.

5 REPLIES 5
Highlighted

Re: dynamic spark.driver.host allocation

Contributor

Are you running Spark on YARN? If so, as explained in SPARK-4253 you cannot set spark.driver.host . It will ignore this config item in yarn-cluster mode.

Highlighted

Re: dynamic spark.driver.host allocation

Explorer

Is there any yarn configuration directive that allows supplying a list of hosts that can or can't be used as drivers?

Highlighted

Re: dynamic spark.driver.host allocation

Contributor

Exclusivity is possible in yarn using node labels. Use the property spark spark.yarn.am.nodeLabelExpression to restrict application master to a set of nodes while running spark on yarn. Add the node labels to whichever nodes you want to use for application masters (which I believe will launch driver program for spark).

Enabling YARN Node Labels

In your case, if temporary nodes are handful compared to static nodes, it will be worth to explore non exclusive node labels that will prevent AM from being created on temporary nodes.

Highlighted

Re: dynamic spark.driver.host allocation

New Contributor

We use yarn node labels for some of our applications that have special processing requirements and works great!

Highlighted

Re: dynamic spark.driver.host allocation

Explorer

I should also state that the goal here is to not loose yarn logs. When these temporarily allocated workers are terminated we don't want to loose the ability to review the detailed yarn logs. The driver holds these logs iirc, which is why we only want static resources to be assigned as drivers.

Don't have an account?