Support Questions
Find answers, ask questions, and share your expertise

What are configuration files in Apache Hadoop?


Super Mentor

@Shreya Gupta

core-site.xml & hdfs-site.xml are the important ones.

Hadoop’s Java configuration is driven by two types of important configuration files:

  • Read-only default configuration core-default.xml, hdfs-default.xml, yarn-default.xml and mapred-default.xml.
  • Site-specific configuration - core-site.xml, hdfs-site.xml, yarn-site.xml and mapred-site.xml.

Edit the following Core Hadoop Configuration files to setup the cluster.

• core-site.xml
• hdfs-site.xml
• mapred-site.xml
• masters
• slaves
HADOOP_HOME directory (the extracted directory(etc) is called as HADOOP_HOME. e.g. hadoop-2.6.0-cdh5.5.1) contain all the libraries, scripts, configuration files, etc.

1. This file specifies environment variables that affect the JDK used by Hadoop Daemon (bin/hadoop).
As Hadoop framework is written in Java and uses Java Runtime environment, one of the important environment variables for Hadoop daemon is $JAVA_HOME in

2. This variable directs Hadoop daemon to the Java path in the system
Actual:export JAVA_HOME=<path-to-the-root-of-your-Java-installation>
Change:export JAVA_HOME=</usr/lib/jvm/java-8-oracle/>
3. This file informs Hadoop daemon where NameNode runs in the cluster. It contains the configuration settings for Hadoop Core such as I/O settings that are common to HDFS and MapReduce.


 Location of namenode is specified by fs.defaultFS property
 namenode running at 9000 port on localhost.
 hadoop.tmp.dir property to specify the location where temporary as well as permanent data of Hadoop will be stored.
 “/home/dataflair/hadmin” is my location; here you need to specify a location where you have Read Write privileges.
 we need to make changes in Hadoop configuration file hdfs-site.xml (which is located in HADOOP_HOME/etc/hadoop) by executing the below command:
Hdata@ubuntu:~/hadoop-2.6.0-cdh5.5.1/etc/hadoop$ nano hdfs-site.xml

Replication factor


 Replication factor is specified by dfs.replication property;
 as it is a single node cluster hence we will set replication to 1.

 we need to make changes in Hadoop configuration file mapred-site.xml (which is located in HADOOP_HOME/etc/hadoop)
 Note: In order to edit mapred-site.xml file we need to first create a copy of file mapred-site.xml.template. A copy of this file can be created using the following command:
Hdata@ubuntu:~/ hadoop-2.6.0-cdh5.5.1/etc/hadoop$ cp mapred-site.xml.template mapred-site.xml
 We will now edit the mapred-site.xml file by using the following command:
Hdata@ubuntu:~/hadoop-2.6.0-cdh5.5.1/etc/hadoop$ nano mapred-site.xml


In order to specify which framework should be used for MapReduce, we use property, yarn is used here.



 In order to specify auxiliary service need to run with nodemanager“yarn.nodemanager.aux-services” property is used.
 Here Shuffling is used as auxiliary service. And in order to know the class that should be used for shuffling we user “yarn.nodemanager.aux-services.mapreduce.shuffle.class”

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.