Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Introductory Hadoop queries

avatar
Rising Star

Hi,

Being a novice, I am trying to understand answers to the below questions?

1. what is the difference of having configuration defined in hadoop-env.sh vs defining it hdfs-site.xml or yarn-site.xml?

2. My presumption is *-default.xml files will have the standard Apache defined configuration values and any custom values for the standard properties (either Hadoop vendor specific like Hortonworks / Cloudera or implementation specific at a project level) will be defined in the *-site.xml files. Am I correct in my understanding?

3. What is the difference of /usr/hdp/current and /usr/hdp/2.4.0.0.169 folders on Sandbox? What is the importance/ significance of each of these folders? Are they both required even on production deployments?

1 ACCEPTED SOLUTION

avatar
Master Guru

1) hadoop env are linux environment variables for the processes. Some things need to be set this way because they are used by the shell scripts starting the applications. ( RAM settings ... ) The XML files can by definition only work after the JVM is started

2) that is true although the defaults don't have everything as well. Some defaults are hard coded in the applications

3) /usr/hdp/2.4.0.0.169 is the actual folder containing the distribution. If you upgrade the cluster HDP will create a new folder /usr/hdp/2.4.2.xxx for example to enable rollback operations. /usr/hdp/current is a folder with symbolic links to the current distribution i.e. pointing to the real underlying folder with the version you have selected. ( They also change the structure a bit ). Under the cover HDP uses autility called hdp-select that sets these symbolic links to the version you selected.

View solution in original post

1 REPLY 1

avatar
Master Guru

1) hadoop env are linux environment variables for the processes. Some things need to be set this way because they are used by the shell scripts starting the applications. ( RAM settings ... ) The XML files can by definition only work after the JVM is started

2) that is true although the defaults don't have everything as well. Some defaults are hard coded in the applications

3) /usr/hdp/2.4.0.0.169 is the actual folder containing the distribution. If you upgrade the cluster HDP will create a new folder /usr/hdp/2.4.2.xxx for example to enable rollback operations. /usr/hdp/current is a folder with symbolic links to the current distribution i.e. pointing to the real underlying folder with the version you have selected. ( They also change the structure a bit ). Under the cover HDP uses autility called hdp-select that sets these symbolic links to the version you selected.