Created 10-02-2017 03:37 PM
Is it possible; via ambari's configuration itself, to specify the list of drivers to possibly use?
End goal here would be to only allow a specific pool of hosts to be utilized as drivers as we have dynamically allocated workers that get torn down after processing.
Hoping to avoid having to manage this in job source code repos (e.g. spark.driver.host for sc config) and have it as part of the blueprint/ambari configuration.
For example we have 200+ static hosts, but at night and during various times we've automated the addition of temporary workers. We'd like to keep the driver landing on randomly chosen workers to avoid it ending up on a temporary worker.
Created 10-02-2017 04:39 PM
Are you running Spark on YARN? If so, as explained in SPARK-4253 you cannot set spark.driver.host . It will ignore this config item in yarn-cluster mode.
Created 10-02-2017 04:48 PM
Is there any yarn configuration directive that allows supplying a list of hosts that can or can't be used as drivers?
Created 10-02-2017 04:57 PM
Exclusivity is possible in yarn using node labels. Use the property spark spark.yarn.am.nodeLabelExpression to restrict application master to a set of nodes while running spark on yarn. Add the node labels to whichever nodes you want to use for application masters (which I believe will launch driver program for spark).
In your case, if temporary nodes are handful compared to static nodes, it will be worth to explore non exclusive node labels that will prevent AM from being created on temporary nodes.
Created 10-02-2017 05:25 PM
We use yarn node labels for some of our applications that have special processing requirements and works great!
Created 10-02-2017 04:56 PM
I should also state that the goal here is to not loose yarn logs. When these temporarily allocated workers are terminated we don't want to loose the ability to review the detailed yarn logs. The driver holds these logs iirc, which is why we only want static resources to be assigned as drivers.