Support Questions

Find answers, ask questions, and share your expertise

difference between 'mapreduce.application.classpath' and 'yarn.application.classpath'

avatar
Contributor

Hi All, it may be a trivial question for many, but could you explain what is the difference or relation between classpath defined in yarn.application.classpath and mapreduce.application.classpath? Does the latter overwrite the former for mapreduce applications? There is also variable MR2_CLASSPATH that is included by default in mapreduce.application.classpath. Where is taken from? Is the mapreduce.application.classpath relevant only for gateways from were application is submitted to yarn?

1 ACCEPTED SOLUTION

avatar
Mentor
> Does the latter overwrite the former for mapreduce applications?

No, at least as of CDH 5.x, the two are additive. The
yarn.application.classpath value goes on early (adding Common, HDFS and
YARN), followed by mapreduce.application.classpath (adding just MR2).

The reason they are separate is tied to another feature (available in CM
6.x) that lets you supply all framework jars as an archive along with the
job rather than rely on local, pre-installed locations on all worker hosts
that are subject to change anytime outside of a container's runtime.

> There is also variable MR2_CLASSPATH that is included by default in
mapreduce.application.classpath. Where is taken from?

This is exclusive to Cloudera Manager managed environments, and is a
reserved env-var name used to assist Parcels that may choose to supply some
jars as 'plugins' to an app or a service. All such env-vars are listed
here:
https://github.com/cloudera/cm_ext/wiki/Plugin-parcel-environment-variables.
In most cases you can ignore this env-var, as it will be empty usually.

> Is the mapreduce.application.classpath relevant only for gateways from
were application is submitted to yarn?

No, the values are just variable names, and are not substituted at the
gateway. They are substituted only on the NodeManager when the prepared
container command/script actually executes. This lets you manage different
install paths on different worker hosts, where local environments point to
actual locations of jars.

View solution in original post

4 REPLIES 4

avatar
Mentor
> Does the latter overwrite the former for mapreduce applications?

No, at least as of CDH 5.x, the two are additive. The
yarn.application.classpath value goes on early (adding Common, HDFS and
YARN), followed by mapreduce.application.classpath (adding just MR2).

The reason they are separate is tied to another feature (available in CM
6.x) that lets you supply all framework jars as an archive along with the
job rather than rely on local, pre-installed locations on all worker hosts
that are subject to change anytime outside of a container's runtime.

> There is also variable MR2_CLASSPATH that is included by default in
mapreduce.application.classpath. Where is taken from?

This is exclusive to Cloudera Manager managed environments, and is a
reserved env-var name used to assist Parcels that may choose to supply some
jars as 'plugins' to an app or a service. All such env-vars are listed
here:
https://github.com/cloudera/cm_ext/wiki/Plugin-parcel-environment-variables.
In most cases you can ignore this env-var, as it will be empty usually.

> Is the mapreduce.application.classpath relevant only for gateways from
were application is submitted to yarn?

No, the values are just variable names, and are not substituted at the
gateway. They are substituted only on the NodeManager when the prepared
container command/script actually executes. This lets you manage different
install paths on different worker hosts, where local environments point to
actual locations of jars.

avatar
Contributor

Hello @Harsh J,

 

thanks you very much for your explanations. I asked the question starting with some assumptions that turned out to be false. Thanks for showing the right answers. Last question from my side: could you point to the documentation where mentioned CM 6.x feature for supplying framework jars is described? It sounds interesting.

avatar
Mentor
The feature in C6.x is implicit and aimed to support easier rolling
upgrades (when the job jars are part of the job exclusively, changes to
locally installed binaries will not affect it during upgrades). A release
note item is documented here for this:
https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cm_600_new_features.html...

avatar
Contributor

Great! Thank you very much!