we recently deployed spark on YARN in CDH 5.2.0.
Wa re seeing some wornnings like
WARN ScriptBasedMapping: Exception running /etc/hadoop/conf.cloudera.yarn/topology.py 10.1.1.82 java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn/topology.py" (in directory "/srv/sam/explore/lift"): error=2, No such file or directory
Then I realised /etc/hadoop/conf.cloudera.yarn/topology.py is missing from all gateway/client machine but all datanodes have that file.
to fix this issue I coppied this file from DataNodes to Gateway/Client, we are on Cloudera Express 5.0.0
how can I just deploy this file from Cloudera Manager ?
In CM 5.0, the YARN Client configuration had a bug that did not propagate the topology.py file within the client configuration deployment or download, which was subsequently fixed in CM 5.1 onward. For those still running CM 5.0 who are hitting this issue, we recommend to copy the topology.* from a known good location (DataNode for example) to the existing node.
In general, for any nodes that will be submitting Spark-On-YARN jobs, we recommend that these nodes contain Gateway roles for both Spark and Yarn, and that the Client configuration has been deployed (either by Deploy Client Configuration from the Cluster Actions drop-down, or by each individual component, ensuring that all compoents have their Client configurations deployed).