Member since
02-22-2018
4
Posts
3
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5032 | 02-26-2018 03:25 PM |
02-26-2018
03:25 PM
1 Kudo
Was able to get an answer. Removing the "file" in hdfs-site.xml did the trick.
... View more
02-21-2018
10:50 AM
Apologies for the line spacing, if it might be difficult to follow.
... View more
02-21-2018
10:48 AM
1 Kudo
I have two computers, the one I work on (CENTOS installed) and a second computer (also CENTOS (server), to act as the datanode), both not in a VM environment. I want to create a multi-node cluster with these computers. I have directly connected the computers together to test for possible network issues (ports etc.) and found that not to be the issue. I also used the guide by https://tecadmin.net/set-up-hadoop-multi-node-cluster-on-centos-redhat/# and https://dwbi.org/etl/bigdata/183-setup-hadoop-cluster. I have created a 'hadoop' user on both machines, with permissions, and established a password-less SSH access between both. The hostname for the computers are: 1. NameNode (main computer): master 2. DataNode (the server): datanode1 The /etc/hosts file I have as(showing 'computerIP' in place of the actual IP's): computerIP master computerIP datanode1 My .xml file configurations on the NameNode are: 1. core-site.xml: <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:8020/</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> </configuration> 2. hdfs-site.xml: <configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:/opt/volume/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/opt/volume/datanode</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>file:/opt/volume/namesecondary</value> </property> <property> <name>dfs.replication</name> <property> <name>fs.defaultFS</name> <value>hdfs://master:8020/</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> </configuration> 3. mapred-site.xml: <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> <property> <name>yarn.app.mapreduce.am.staging-dir</name> <value>/user/app</value> </property> <property> <name>mapred.child.java.opts</name> <value>-Djava.security.egd=file:/dev/../dev/urandom</value> </property> </configuration> 4. yarn-site.xml: <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.resourcemanager.bind-host</name> <value>0.0.0.0</value> </property> <property> <name>yarn.nodemanager.bind-host</name> <value>0.0.0.0</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>file:/opt/volume/local</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>file:/opt/volume/yarn/log</value> </property> <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>hdfs://master:8020/var/log/hadoop-yarn/apps</value> </property> </configuration> 5. JAVA_HOME (Where java is located): # The java implementation to use. export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk 6. Slaves file: datanode1 7. Masters file: master My .bashrc file is as follows: export JAVA_HOME=/usr/lib/java-1.8.0 export PATH=$PATH:$JAVA_HOME/bin export HADOOP_HOME=/opt/hadoop/hadoop-2.8.3 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" export CLASSPATH=$CLASSPATH:/usr/local/hadoop/lib/*:. export HADOOP_OPTS="$HADOOP_OPTS -Djava.security.egd=file:/dev/../dev/urandom" The permissions are as follows on both machines (from terminal): [hadoop@master hadoop]$ ls -al /opt total 0 drwxr-xr-x. 5 hadoop hadoop 44 Feb 15 16:05 . dr-xr-xr-x. 17 root root 242 Feb 21 11:38 .. drwxr-xr-x. 3 hadoop hadoop 53 Feb 15 16:00 hadoop drwxr-xr-x. 2 hadoop hadoop 6 Sep 7 01:11 rh drwxr-xr-x. 7 hadoop hadoop 84 Feb 20 11:27 volume For the DataNode: [hadoop@datanode1 ~]$ ls -al /opt total 0 drwxrwxrwx. 4 hadoop hadoop 34 Feb 20 11:06 . dr-xr-xr-x. 17 root root 242 Feb 19 16:13 .. drwxr-xr-x. 3 hadoop hadoop 53 Feb 20 11:07 hadoop drwxrwxrwx. 5 hadoop hadoop 59 Feb 21 09:53 volume So when I go to format the namenode: hdfs namenode -format, I get that the NameNode is formatted on the 'master'. And then I go start the system, $HADOOP_HOME/sbin/start-dfs.sh and get the following output: [hadoop@master hadoop]$ $HADOOP_HOME/sbin/start-dfs.sh Starting namenodes on [master] master: starting namenode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-hadoop-namenode-master.out datanode1: starting datanode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-hadoop-datanode-datanode1.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /opt/hadoop/hadoop-2.8.3/logs/hadoop-hadoop-secondarynamenode-master.out Showing that the datanode is started, yet i go to the 50070 terminal to find that the datanode storage is not configured. I then stop the whole process, $HADOOP_HOME/sbin/stop-dfs.sh only to find that indeed the datanode didnt even start in the first place. [hadoop@master hadoop]$ $HADOOP_HOME/sbin/stop-dfs.sh Stopping namenodes on [master] master: stopping namenode datanode1: no datanode to stop Stopping secondary namenodes [0.0.0.0] 0.0.0.0: stopping secondarynamenode This is even when the computers are directly connected together. I have no idea why the datanode is not starting, and I hope someone could help. Need this for my masters thesis. Thanks!
... View more
Labels:
- Labels:
-
Apache Hadoop