Reply
Explorer
Posts: 8
Registered: ‎08-12-2014

Where is the setting for the port-range used by org.apache.hadoop.mapred.YarnChild?

[ Edited ]

org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:136)

 

I have tried setting yarn.app.mapreduce.am.job.client.port-range to a specific range, but it doesn't look like that is the right setting.

I have seen it use ports ranging from 32000 - 65000

 

CDH 5.1

Explorer
Posts: 8
Registered: ‎08-12-2014

Re: Where is the setting for the port-range used by org.apache.hadoop.mapred.YarnChild?

From what I have seen these are IPC ephemeral ports.

Is there anyway to control the range on these?

Posts: 1,524
Kudos: 265
Solutions: 232
Registered: ‎07-31-2013

Re: Where is the setting for the port-range used by org.apache.hadoop.mapred.YarnChild?

General ephemeral port ranges can be controlled from the OS side, but I've hardly ever seen a need to be doing such a thing. What is your exact issue or situation that is leading you down this path?
Backline Customer Operations Engineer
New Contributor
Posts: 2
Registered: ‎03-01-2016

Re: Where is the setting for the port-range used by org.apache.hadoop.mapred.YarnChild?

Simple. We need communication ports between an external cluster and internal Informatica server to be predictable because there's a strict firewall in place. So we need some way to make sure only a known set of ports is used for communication between client and cluster.
Posts: 1,524
Kudos: 265
Solutions: 232
Registered: ‎07-31-2013

Re: Where is the setting for the port-range used by org.apache.hadoop.mapred.YarnChild?

Thank you for explaining the need Rmutsaers!

The AM's IPC port is indeed used directly by clients and are controllable on the serving AM via the yarn.app.mapreduce.am.job.client.port-range config. It still has to be a range though, and the range must be chosen by keeping in mind that it will also effectively limit the number of AMs you can run on the host.

The AM's web port is also served on an ephemeral port, but this is a non-concern cause clients do not access the AM web port directly; they go via the RM's proxy service (wherein the RM makes the GET HTTP requests to the actual AM port, within the cluster).

Does yarn.app.mapreduce.am.job.client.port-range not solve your need? There's no IPC proxying today to eliminate the range requirement, unfortunately. The con of not having the IPC port ranges open is not too fatal, as the job can still get a completed notification once it gets moved to the job history server (and the RM redirects the client to it).
Backline Customer Operations Engineer
New Contributor
Posts: 2
Registered: ‎03-01-2016

Re: Where is the setting for the port-range used by org.apache.hadoop.mapred.YarnChild?

Thanks, i'll try that and see if that solves the issue.

Expert Contributor
Posts: 68
Registered: ‎10-04-2016

Re: Where is the setting for the port-range used by org.apache.hadoop.mapred.YarnChild?

Did the change work for you?

Expert Contributor
Posts: 68
Registered: ‎10-04-2016

Re: Where is the setting for the port-range used by org.apache.hadoop.mapred.YarnChild?

I made the following in YARN/MR setting, but it doesn't work.

YARN Service MapReduce Advanced Configuration Snippet (Safety Valve)
<property>
<name>yarn.app.mapreduce.am.job.client.port-range</name>
<value>44000-50000</value>
<description>Restrict the range for firewall.</description>
</property>
For advanced use only, a string to be inserted into mapred-site.xml. Applies to configurations of all roles in this service except client configuration.

Posts: 1,524
Kudos: 265
Solutions: 232
Registered: ‎07-31-2013

Re: Where is the setting for the port-range used by org.apache.hadoop.mapred.YarnChild?

When you say something "doesn't work" could you please also always include the observed behaviour vs. expected?

 

BTW the safety valve you've used is incorrect ("YARN Service MapReduce …"), as this is a client-side property. Use the safety valve under YARN Gateway called "MapReduce Client Advanced Configuration Snippet (Safety Valve) for mapred-site.xml".

Backline Customer Operations Engineer
Expert Contributor
Posts: 68
Registered: ‎10-04-2016

Re: Where is the setting for the port-range used by org.apache.hadoop.mapred.YarnChild?

Thanks for replying.  I think that I use the correct setting since this should be on the server side.

 

MapReduce Client Advanced Configuration Snippet (Safety Valve) for mapred-site.xml
Gateway Default Group
 
For advanced use only, a string to be inserted into the client configuration for mapred-site.xml.
 
YARN Service MapReduce Advanced Configuration Snippet (Safety Valve)
YARN (MR2 Included) (Service-Wide)
 
For advanced use only, a string to be inserted into mapred-site.xml. Applies to configurations of all roles in this service except client configuration.
 
I double-checked the materialized configuration file on the disk and it contains the range setting.
/run/cloudera-scm-agent/process/1353-yarn-RESOURCEMANAGER/mapred-site.xml (with latest time stamp).
 
 
Announcements