Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Where is the setting for the port-range used by org.apache.hadoop.mapred.YarnChild?

avatar
Contributor

org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:136)

 

I have tried setting yarn.app.mapreduce.am.job.client.port-range to a specific range, but it doesn't look like that is the right setting.

I have seen it use ports ranging from 32000 - 65000

 

CDH 5.1

17 REPLIES 17

avatar
Contributor

From what I have seen these are IPC ephemeral ports.

Is there anyway to control the range on these?

avatar
Mentor
General ephemeral port ranges can be controlled from the OS side, but I've hardly ever seen a need to be doing such a thing. What is your exact issue or situation that is leading you down this path?

avatar
New Contributor
Simple. We need communication ports between an external cluster and internal Informatica server to be predictable because there's a strict firewall in place. So we need some way to make sure only a known set of ports is used for communication between client and cluster.

avatar
Mentor
Thank you for explaining the need Rmutsaers!

The AM's IPC port is indeed used directly by clients and are controllable on the serving AM via the yarn.app.mapreduce.am.job.client.port-range config. It still has to be a range though, and the range must be chosen by keeping in mind that it will also effectively limit the number of AMs you can run on the host.

The AM's web port is also served on an ephemeral port, but this is a non-concern cause clients do not access the AM web port directly; they go via the RM's proxy service (wherein the RM makes the GET HTTP requests to the actual AM port, within the cluster).

Does yarn.app.mapreduce.am.job.client.port-range not solve your need? There's no IPC proxying today to eliminate the range requirement, unfortunately. The con of not having the IPC port ranges open is not too fatal, as the job can still get a completed notification once it gets moved to the job history server (and the RM redirects the client to it).

avatar
New Contributor

Thanks, i'll try that and see if that solves the issue.

avatar
Expert Contributor

Did the change work for you?

avatar
Expert Contributor

I made the following in YARN/MR setting, but it doesn't work.

YARN Service MapReduce Advanced Configuration Snippet (Safety Valve)
<property>
<name>yarn.app.mapreduce.am.job.client.port-range</name>
<value>44000-50000</value>
<description>Restrict the range for firewall.</description>
</property>
For advanced use only, a string to be inserted into mapred-site.xml. Applies to configurations of all roles in this service except client configuration.

avatar
Mentor

When you say something "doesn't work" could you please also always include the observed behaviour vs. expected?

 

BTW the safety valve you've used is incorrect ("YARN Service MapReduce …"), as this is a client-side property. Use the safety valve under YARN Gateway called "MapReduce Client Advanced Configuration Snippet (Safety Valve) for mapred-site.xml".

avatar
Expert Contributor

Thanks for replying.  I think that I use the correct setting since this should be on the server side.

 

MapReduce Client Advanced Configuration Snippet (Safety Valve) for mapred-site.xml
Gateway Default Group
 
For advanced use only, a string to be inserted into the client configuration for mapred-site.xml.
 
YARN Service MapReduce Advanced Configuration Snippet (Safety Valve)
YARN (MR2 Included) (Service-Wide)
 
For advanced use only, a string to be inserted into mapred-site.xml. Applies to configurations of all roles in this service except client configuration.
 
I double-checked the materialized configuration file on the disk and it contains the range setting.
/run/cloudera-scm-agent/process/1353-yarn-RESOURCEMANAGER/mapred-site.xml (with latest time stamp).