About abhishes

abhishes · ‎12-15-2014

This is the error. CRITICAL Initialization failed for Block pool BP-1219478626-192.168.1.20-1418484473049 (Datanode Uuid null) service to nn1home/10.192.128.227:8022 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(10.192.128.231, datanodeUuid=ff6a2644-3140-4451-a59f-496478a000d7, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=cluster18;nsid=850143528;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:889) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4798) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1037) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26378) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

abhishes · ‎12-15-2014

I have built and existing demo cluster. It was working perfectly well. however because of some changes, I have to change the IP address of each of the data nodes. I did a grep -R 'oldIP' /etc on each machine and edited the files which contained the old IP addresses and replaced them with new IP. I rebooted each machine. However despite doing that when I do sudo -u hdfs hadoop dfsadmin -report it shows me 2 dead data nodes and it lists the old IP addresses. How can I remove old IP and then replace them with new IP addresses?

abhishes · ‎08-30-2014

I was able to solve the problem. I have to specify "python" as well in the mapper like sudo -u hdfs hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming.jar -input /sample/cite75_99.txt -output /foo -mapper 'python RandomSample.py 10' -file RandomSale.py -numReduceTasks 1

abhishes · ‎08-30-2014

I changed my python code to #!/usr/bin/env python import sys, random file = open("/tmp/log.txt", "w") for line in sys.stdin: file.write("line: " + line + "\n") file.close() When i run my job, I see exactly the same error and the file /tmp/log.txt is not created on any machine. so I guess the script is not even being invoked I suppose.

abhishes · ‎08-30-2014

I have a 5 node hadoop cluster on which I can execute the following streaming job successfully sudo -u hdfs hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming.jar -input /sample/apat63_99.txt -output /foo1 -mapper 'wc -l'-numReduceTasks 0 But when I try to execute a streaming job using python sudo -u hdfs hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming.jar -input /sample/apat63_99.txt -output /foo5 -mapper 'AttributeMax.py 8'-file '/tmp/AttributeMax.py'-numReduceTasks 1 I get an error packageJobJar:[/tmp/AttributeMax.py,/tmp/hadoop-hdfs/hadoop-unjar2062240123197790813/][]/tmp/streamjob4074525553604040275.jar tmpDir=null14/08/2911:22:58 WARN mapred.JobClient:UseGenericOptionsParserfor parsing the arguments.Applications should implement Toolfor the same.14/08/2911:22:58 INFO mapred.FileInputFormat:Total input paths to process :114/08/2911:22:59 INFO streaming.StreamJob: getLocalDirs():[/tmp/hadoop-hdfs/mapred/local]14/08/2911:22:59 INFO streaming.StreamJob:Running job: job_201408272304_003014/08/2911:22:59 INFO streaming.StreamJob:To kill this job, run:14/08/2911:22:59 INFO streaming.StreamJob: UNDEF/bin/hadoop job -Dmapred.job.tracker=jt1:8021-kill job_201408272304_003014/08/2911:22:59 INFO streaming.StreamJob:Tracking URL: http://jt1:50030/jobdetails.jsp?jobid=job_201408272304_003014/08/2911:23:00 INFO streaming.StreamJob: map 0% reduce 0%14/08/2911:23:46 INFO streaming.StreamJob: map 100% reduce 100%14/08/2911:23:46 INFO streaming.StreamJob:To kill this job, run:14/08/2911:23:46 INFO streaming.StreamJob: UNDEF/bin/hadoop job -Dmapred.job.tracker=jt1:8021-kill job_201408272304_003014/08/2911:23:46 INFO streaming.StreamJob:Tracking URL: http://jt1:50030/jobdetails.jsp?jobid=job_201408272304_003014/08/2911:23:46 ERROR streaming.StreamJob:Jobnot successful.Error: NA14/08/2911:23:46 INFO streaming.StreamJob: killJob... In my job tracker console I see errors java.io.IOException: log:null R/W/S=2359/0/0in:NA [rec/s] out:NA [rec/s]minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null HOST=null USER=mapred HADOOP_USER=null last Hadoop input:|null|last tool output:|null|Date:FriAug2911:22:43 CDT 2014java.io.IOException:Broken pipe at java.io.FileOutputStream.writeBytes(NativeMethod) at java.io.FileOutputStream.write(FileOutputStream.java:282) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.streaming.io.TextInputWriter.writeUTF8(TextInputWriter.java:72) at org.apache.hadoop.streaming.io.TextInputWriter.writeValue(TextInputWriter.java:51) at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:110) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.streaming.Pipe The python code itself is pretty simple #!/usr/bin/env pythonimport sys index = int(sys.argv[1])max =0for line in sys.stdin fields = line.strip().split(",")if fields[index].isdigit(): val = int(fields[index])if(val > max😞 max = val else:print max

abhishes · ‎08-01-2014

When I try to start the job traker using this command service hadoop-0.20-mapreduce-jobtracker start I can see this error org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:149) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4891) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4873) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4847) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3192) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3156) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3137) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:669) I found this blog post which tries to address this issue http://blog.spryinc.com/2013/06/hdfs-permissions-overcoming-permission.html I followed the steps here and did groupadd supergroup usermod -a -G supergroup mapred usermod -a -G supergroup hdfs but i still get this problem. The only different between the blog entry and me is that for me the error is on the "root" dir whereas for the blog it is for the "/user" Here is my mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapred.job.tracker</name> <value>jt1:8021</value> </property> <property> <name>mapred.local.dir</name> <value>/tmp/mapred/jt</value> </property> <property> <name>mapred.system.dir</name> <value>/tmp/mapred/system</value> </property> <property> <name>mapreduce.jobtracker.staging.root.dir</name> <value>/user</value> </property> <property> <name>mapred.job.tracker.persist.jobstatus.active</name> <value>true</value> </property> <property> <name>mapred.job.tracker.persist.jobstatus.hours</name> <value>24</value> </property> <property> <name>mapred.jobtracker.taskScheduler</name> <value>org.apache.hadoop.mapred.FairScheduler</value> </property> <property> <name>mapred.fairscheduler.poolnameproperty</name> <value>user.name</value> </property> <property> <name>mapred.fairscheduler.allocation.file</name> <value>/etc/hadoop/conf/fair-scheduler.xml</value> </property> <property> <name>mapred.fairscheduler.allow.undeclared.pools</name> <value>true</value> </property> </configuration> I also found this blog http://www.hadoopinrealworld.com/fixing-org-apache-hadoop-security-accesscontrolexception-permission-denied/ I did sudo -u hdfs hdfs dfs -mkdir /home sudo -u hdfs hdfs dfs -chown mapred:mapred /home sudo -u hdfs hdfs dfs -mkdir /home/mapred sudo -u hdfs hdfs dfs -chown mapred /home/mapred sudo -u hdfs hdfs dfs -chown hdfs:supergroup / but still problem is not resolved 😞 Please help. I wonder why it is going for the "root" dir inode="/":hdfs:supergroup:drwxr-xr-x

abhishes · ‎07-29-2014

Thank you so much. Your answer is absolutely correct. I went to each server and did nn1: service zookeeper-server init --myid=1 --force nn2: service zookeeper-server init --myid=2 --force jt1: service zookeeper-server init --myid=3 --force earlier I had chosen an ID of 1 on every machine. I also corrected my zoo.cfg. to ensure right entries. Now it works and I am able to do sudo -u hdfs hdfs zkfc -formatZK Thank you so much!

abhishes · ‎07-26-2014

I am reading this article http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html I am having problems in visualizing how this code will execute in a distributed environment. So when I package this jar and execute this on jar on a hadoop cluster. Below is my understanding of things and also my doubts and questions 1. First the Run method will be called which will setup the JobConf object and will run the code. (which machine will the main method execute on? the job tracker node? the task tracker node? 2. Now suppose a machine is randomly chosen to run the main method. My understanding is that this JAR file will be serialized and sent to few machines running task tracker where the map funcion will be run first. For this, the input file will be split and fragments will be serialized to the nodes running the map tasks. (Question here is that does hadoop persist these split files as well on HDFS... or are the splits in memory?) 3. The map function will create a key value pair and will sort it as well. (Question here is that does hadoop persist the output of the map functions to HDFS before giving it off to the reduce processes?) 4. Now hadoop will start reduce processes accross the cluster to run the reduce code. This code will be given teh ouput of the map tasks. 5. My biggest confusion is that after each reduce has run and we have output from each reduce process. how do we then merge those outputs into the final output? So for example, if we were calculating the value of pi (there is a sample for that) .... how is the final value calculated from the output of different reduce tasks? Sorry if this question is very basic or very broad... I am just trying to lean stuff.

abhishes · ‎07-26-2014

Yes your suggestion is right. the 32 machine did not have the firewall switched off. when I did serivce stop firewalld.server and service disable firewalld.service it started to work fine.

abhishes · ‎07-25-2014

When I issue the command sudo -u hdfs hdfs zkfc -formatZK i get the error 14/07/24 00:24:34 INFO zookeeper.ClientCnxn: Opening socket connection to server nn1/192.168.1.30:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration) 14/07/24 00:24:34 INFO zookeeper.ClientCnxn: Socket connection established to nn1/192.168.1.30:2181, initiating session 14/07/24 00:24:34 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect 14/07/24 00:24:35 INFO zookeeper.ClientCnxn: Opening socket connection to server nn2/192.168.1.31:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration) 14/07/24 00:24:35 INFO zookeeper.ClientCnxn: Socket connection established to nn2/192.168.1.31:2181, initiating session 14/07/24 00:24:35 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect 14/07/24 00:24:35 INFO zookeeper.ClientCnxn: Opening socket connection to server jt1/192.168.1.32:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration) 14/07/24 00:24:35 INFO zookeeper.ClientCnxn: Socket connection established to jt1/192.168.1.32:2181, initiating session 14/07/24 00:24:35 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect 14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Opening socket connection to server nn1/192.168.1.30:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration) 14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Socket connection established to nn1/192.168.1.30:2181, initiating session 14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect 14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Opening socket connection to server nn2/192.168.1.31:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration) 14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Socket connection established to nn2/192.168.1.31:2181, initiating session 14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect 14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Opening socket connection to server jt1/192.168.1.32:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration) 14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Socket connection established to jt1/192.168.1.32:2181, initiating session 14/07/24 00:24:37 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect 14/07/24 00:24:39 INFO zookeeper.ClientCnxn: Opening socket connection to server nn1/192.168.1.30:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration) 14/07/24 00:24:39 INFO zookeeper.ClientCnxn: Socket connection established to nn1/192.168.1.30:2181, initiating session 14/07/24 00:24:39 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect 14/07/24 00:24:39 ERROR ha.ActiveStandbyElector: Connection timed out: couldn't connect to ZooKeeper in 5000 milliseconds 14/07/24 00:24:40 INFO zookeeper.ZooKeeper: Session: 0x0 closed 14/07/24 00:24:40 INFO zookeeper.ClientCnxn: EventThread shut down 14/07/24 00:24:40 FATAL ha.ZKFailoverController: Unable to start failover controller. Unable to connect to ZooKeeper quorum at nn1:2181,nn2:2181,jt1:2181. Please check the configured value for ha.zookeeper.quorum and ensure that ZooKeeper is running. I have confirmed that the zookeeper service is running on every machine by [root@nn1 ~]# service zookeeper-server start JMX enabled by default Using config: /etc/zookeeper/conf/zoo.cfg Starting zookeeper ... already running as process 1065. I can also do an nc from every machine to every machine [root@nn1 ~]# nc nn1 2181 ^C [root@nn1 ~]# nc nn2 2181 ^C [root@nn1 ~]# nc jt1 2181 ^C [root@nn1 ~]# I can see this in the zookeeper event log 2014-07-24 00:24:18,706 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - Notification time out: 60000 2014-07-24 00:24:34,956 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.1.30:35151 2014-07-24 00:24:34,956 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2014-07-24 00:24:34,956 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /192.168.1.30:35151 (no session established for client) 2014-07-24 00:24:37,075 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.1.30:35154 2014-07-24 00:24:37,076 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2014-07-24 00:24:37,076 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /192.168.1.30:35154 (no session established for client) 2014-07-24 00:24:39,432 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.1.30:35157 2014-07-24 00:24:39,433 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2014-07-24 00:24:39,433 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /192.168.1.30:35157 (no session established for client) 2014-07-24 00:25:18,709 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (2, 1) 2014-07-24 00:25:18,710 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (3, 1) 2014-07-24 00:25:18,711 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - Notification time out: 60000 2014-07-24 00:26:18,713 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (2, 1) 2014-07-24 00:26:18,715 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (3, 1) 2014-07-24 00:26:18,716 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - Notification time out: 60000 2014-07-24 00:26:40,619 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.1.30:35170 2014-07-24 00:26:43,508 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:662) 2014-07-24 00:26:43,511 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /192.168.1.30:35170 (no session established for client) 2014-07-24 00:27:18,717 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (2, 1) 2014-07-24 00:27:18,719 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (3, 1) 2014-07-24 00:27:18,719 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - Notification time out: 60000

Online	Offline
Last Visited	‎08-16-2017 08:06 PM

Member Since	‎07-15-2014 11:42 AM
Last Visited	‎08-16-2017 08:06 PM
Posts	57
Kudos received	10

Cloudera Community

Re: Not able to connect to cloudera manager admin ...

Re: The configuration of new cluster fails at name...

Re: Problem in creating cloudera cluster

Re: Where does CDH5 store the list of data nodes?

Re: Not able to execute Python based Hadoop Stream...

Re: Where does CDH5 store the list of data nodes?

Where does CDH5 store the list of data nodes?

Re: Not able to execute Python based Hadoop Stream...

Re: Not able to execute Python based Hadoop Stream...

Not able to execute Python based Hadoop Streaming ...

Permission denied: user=mapred, access=WRITE, inod...

Re: FATAL ha.ZKFailoverController: Unable to start...

NewBee Question on Map reduce

Re: Is it possible to have a 3 VM CDH 4.1 cluster ...

FATAL ha.ZKFailoverController: Unable to start fai...