Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Introductory Hadoop queries

avatar
Rising Star

Hi,

Being a novice, I am trying to understand answers to the below questions?

1. what is the difference of having configuration defined in hadoop-env.sh vs defining it hdfs-site.xml or yarn-site.xml?

2. My presumption is *-default.xml files will have the standard Apache defined configuration values and any custom values for the standard properties (either Hadoop vendor specific like Hortonworks / Cloudera or implementation specific at a project level) will be defined in the *-site.xml files. Am I correct in my understanding?

3. What is the difference of /usr/hdp/current and /usr/hdp/2.4.0.0.169 folders on Sandbox? What is the importance/ significance of each of these folders? Are they both required even on production deployments?

1 ACCEPTED SOLUTION

avatar
Master Guru

1) hadoop env are linux environment variables for the processes. Some things need to be set this way because they are used by the shell scripts starting the applications. ( RAM settings ... ) The XML files can by definition only work after the JVM is started

2) that is true although the defaults don't have everything as well. Some defaults are hard coded in the applications

3) /usr/hdp/2.4.0.0.169 is the actual folder containing the distribution. If you upgrade the cluster HDP will create a new folder /usr/hdp/2.4.2.xxx for example to enable rollback operations. /usr/hdp/current is a folder with symbolic links to the current distribution i.e. pointing to the real underlying folder with the version you have selected. ( They also change the structure a bit ). Under the cover HDP uses autility called hdp-select that sets these symbolic links to the version you selected.

View solution in original post

1 REPLY 1

avatar
Master Guru

1) hadoop env are linux environment variables for the processes. Some things need to be set this way because they are used by the shell scripts starting the applications. ( RAM settings ... ) The XML files can by definition only work after the JVM is started

2) that is true although the defaults don't have everything as well. Some defaults are hard coded in the applications

3) /usr/hdp/2.4.0.0.169 is the actual folder containing the distribution. If you upgrade the cluster HDP will create a new folder /usr/hdp/2.4.2.xxx for example to enable rollback operations. /usr/hdp/current is a folder with symbolic links to the current distribution i.e. pointing to the real underlying folder with the version you have selected. ( They also change the structure a bit ). Under the cover HDP uses autility called hdp-select that sets these symbolic links to the version you selected.