Archives of Support Questions (Read Only)

bigdata_superno · ‎06-29-2016

Hi,

Being a novice, I am trying to understand answers to the below questions?

1. what is the difference of having configuration defined in hadoop-env.sh vs defining it hdfs-site.xml or yarn-site.xml?

2. My presumption is *-default.xml files will have the standard Apache defined configuration values and any custom values for the standard properties (either Hadoop vendor specific like Hortonworks / Cloudera or implementation specific at a project level) will be defined in the *-site.xml files. Am I correct in my understanding?

3. What is the difference of /usr/hdp/current and /usr/hdp/2.4.0.0.169 folders on Sandbox? What is the importance/ significance of each of these folders? Are they both required even on production deployments?

bleonhardi · ‎06-29-2016

1) hadoop env are linux environment variables for the processes. Some things need to be set this way because they are used by the shell scripts starting the applications. ( RAM settings ... ) The XML files can by definition only work after the JVM is started

2) that is true although the defaults don't have everything as well. Some defaults are hard coded in the applications

3) /usr/hdp/2.4.0.0.169 is the actual folder containing the distribution. If you upgrade the cluster HDP will create a new folder /usr/hdp/2.4.2.xxx for example to enable rollback operations. /usr/hdp/current is a folder with symbolic links to the current distribution i.e. pointing to the real underlying folder with the version you have selected. ( They also change the structure a bit ). Under the cover HDP uses autility called hdp-select that sets these symbolic links to the version you selected.

View solution in original post

bleonhardi · ‎06-29-2016

1) hadoop env are linux environment variables for the processes. Some things need to be set this way because they are used by the shell scripts starting the applications. ( RAM settings ... ) The XML files can by definition only work after the JVM is started

2) that is true although the defaults don't have everything as well. Some defaults are hard coded in the applications

3) /usr/hdp/2.4.0.0.169 is the actual folder containing the distribution. If you upgrade the cluster HDP will create a new folder /usr/hdp/2.4.2.xxx for example to enable rollback operations. /usr/hdp/current is a folder with symbolic links to the current distribution i.e. pointing to the real underlying folder with the version you have selected. ( They also change the structure a bit ). Under the cover HDP uses autility called hdp-select that sets these symbolic links to the version you selected.

Cloudera Community

Archives of Support Questions (Read Only)

Introductory Hadoop queries