Support Questions

shreyag1207 · ‎09-06-2017

jsensharma · ‎09-06-2017

@Shreya Gupta

core-site.xml & hdfs-site.xml are the important ones.

Hadoop’s Java configuration is driven by two types of important configuration files:

Read-only default configuration core-default.xml, hdfs-default.xml, yarn-default.xml and mapred-default.xml.

Site-specific configuration - core-site.xml, hdfs-site.xml, yarn-site.xml and mapred-site.xml.

https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/ClusterSetup.html#Configurin...

singh_maya48 · ‎09-07-2017

Edit the following Core Hadoop Configuration files to setup the cluster.

• hadoop-env.sh
• core-site.xml
• hdfs-site.xml
• mapred-site.xml
• masters
• slaves
HADOOP_HOME directory (the extracted directory(etc) is called as HADOOP_HOME. e.g. hadoop-2.6.0-cdh5.5.1) contain all the libraries, scripts, configuration files, etc.

hadoop-env.sh

1. This file specifies environment variables that affect the JDK used by Hadoop Daemon (bin/hadoop).
As Hadoop framework is written in Java and uses Java Runtime environment, one of the important environment variables for Hadoop daemon is $JAVA_HOME in hadoop-env.sh.

2. This variable directs Hadoop daemon to the Java path in the system
Actual:export JAVA_HOME=<path-to-the-root-of-your-Java-installation>
Change:export JAVA_HOME=</usr/lib/jvm/java-8-oracle/>
core-site.sh
3. This file informs Hadoop daemon where NameNode runs in the cluster. It contains the configuration settings for Hadoop Core such as I/O settings that are common to HDFS and MapReduce.

<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/dataflair/Hadmin</value>
</property>

 Location of namenode is specified by fs.defaultFS property
 namenode running at 9000 port on localhost.
 hadoop.tmp.dir property to specify the location where temporary as well as permanent data of Hadoop will be stored.
 “/home/dataflair/hadmin” is my location; here you need to specify a location where you have Read Write privileges.

hdfs-site.sh
 we need to make changes in Hadoop configuration file hdfs-site.xml (which is located in HADOOP_HOME/etc/hadoop) by executing the below command:
Hdata@ubuntu:~/hadoop-2.6.0-cdh5.5.1/etc/hadoop$ nano hdfs-site.xml

Replication factor

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

 Replication factor is specified by dfs.replication property;
 as it is a single node cluster hence we will set replication to 1.

mapred-site.xml
 we need to make changes in Hadoop configuration file mapred-site.xml (which is located in HADOOP_HOME/etc/hadoop)
 Note: In order to edit mapred-site.xml file we need to first create a copy of file mapred-site.xml.template. A copy of this file can be created using the following command:
Hdata@ubuntu:~/ hadoop-2.6.0-cdh5.5.1/etc/hadoop$ cp mapred-site.xml.template mapred-site.xml
 We will now edit the mapred-site.xml file by using the following command:
Hdata@ubuntu:~/hadoop-2.6.0-cdh5.5.1/etc/hadoop$ nano mapred-site.xml
Changes

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

In order to specify which framework should be used for MapReduce, we use mapreduce.framework.name property, yarn is used here.

yarn-site.xml
Changes

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.
shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

 In order to specify auxiliary service need to run with nodemanager“yarn.nodemanager.aux-services” property is used.
 Here Shuffling is used as auxiliary service. And in order to know the class that should be used for shuffling we user “yarn.nodemanager.aux-services.mapreduce.shuffle.class”

Cloudera Community

Support Questions

What are configuration files in Apache Hadoop?

Compiling Apache Tez with Apache Hadoop 2.8.0 or l...

Parsing Apache Log Files with Spark

Define and Process Data Pipelines in Hadoop With A...

Mirroring Datasets Between Hadoop Clusters with Ap...

How to : Correctly configuring Apache Hive Hook fo...

Small file in hadoop

Migrating Apache Flume Flows to Apache NiFi: Kafka...

Uploading Files for Cloudera Support - alternate m...

Configuring Ambari and Hadoop for Kerberos using ...

Updating The Apache OpenNLP Community Apache NiFi ...