Support Questions

Joey_Krabacher · ‎08-12-2014

org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:136)

I have tried setting yarn.app.mapreduce.am.job.client.port-range to a specific range, but it doesn't look like that is the right setting.

I have seen it use ports ranging from 32000 - 65000

CDH 5.1

Joey_Krabacher · ‎08-14-2014

From what I have seen these are IPC ephemeral ports.

Is there anyway to control the range on these?

Harsh J · ‎11-30-2014

General ephemeral port ranges can be controlled from the OS side, but I've hardly ever seen a need to be doing such a thing. What is your exact issue or situation that is leading you down this path?

Rmutsaers · ‎03-01-2016

Simple. We need communication ports between an external cluster and internal Informatica server to be predictable because there's a strict firewall in place. So we need some way to make sure only a known set of ports is used for communication between client and cluster.

Harsh J · ‎03-01-2016

Thank you for explaining the need Rmutsaers!

The AM's IPC port is indeed used directly by clients and are controllable on the serving AM via the yarn.app.mapreduce.am.job.client.port-range config. It still has to be a range though, and the range must be chosen by keeping in mind that it will also effectively limit the number of AMs you can run on the host.

The AM's web port is also served on an ephemeral port, but this is a non-concern cause clients do not access the AM web port directly; they go via the RM's proxy service (wherein the RM makes the GET HTTP requests to the actual AM port, within the cluster).

Does yarn.app.mapreduce.am.job.client.port-range not solve your need? There's no IPC proxying today to eliminate the range requirement, unfortunately. The con of not having the IPC port ranges open is not too fatal, as the job can still get a completed notification once it gets moved to the job history server (and the RM redirects the client to it).

Rmutsaers · ‎03-01-2016

Thanks, i'll try that and see if that solves the issue.

zhuw.bigdata · ‎01-03-2017

Did the change work for you?

zhuw.bigdata · ‎01-03-2017

I made the following in YARN/MR setting, but it doesn't work.

YARN Service MapReduce Advanced Configuration Snippet (Safety Valve)
<property>
<name>yarn.app.mapreduce.am.job.client.port-range</name>
<value>44000-50000</value>
<description>Restrict the range for firewall.</description>
</property>
For advanced use only, a string to be inserted into mapred-site.xml. Applies to configurations of all roles in this service except client configuration.

Harsh J · ‎01-03-2017

When you say something "doesn't work" could you please also always include the observed behaviour vs. expected?

BTW the safety valve you've used is incorrect ("YARN Service MapReduce …"), as this is a client-side property. Use the safety valve under YARN Gateway called "MapReduce Client Advanced Configuration Snippet (Safety Valve) for mapred-site.xml".

zhuw.bigdata · ‎01-04-2017

Thanks for replying. I think that I use the correct setting since this should be on the server side.

MapReduce Client Advanced Configuration Snippet (Safety Valve) for mapred-site.xml

Gateway Default Group

For advanced use only, a string to be inserted into the client configuration for mapred-site.xml.

YARN Service MapReduce Advanced Configuration Snippet (Safety Valve)

YARN (MR2 Included) (Service-Wide)

For advanced use only, a string to be inserted into mapred-site.xml. Applies to configurations of all roles in this service except client configuration.

I double-checked the materialized configuration file on the disk and it contains the range setting.

/run/cloudera-scm-agent/process/1353-yarn-RESOURCEMANAGER/mapred-site.xml (with latest time stamp).

Cloudera Community

Support Questions

Where is the setting for the port-range used by org.apache.hadoop.mapred.YarnChild?