Created on 04-21-2014 07:26 AM - edited 09-16-2022 01:57 AM
Hello,
We are trying to execute a MapReduce job to write to HBASE using OOZIE Workflow.
We have implemented Kerberos in our Cluster and that is when these issues started....
The Job is failing after 10 minutes with an error - java.lang.RuntimeException: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for supertemp,,99999999999999 after 10 tries.
We had a similar issue with beeline not able to run HBASE Commands and the solution was -
I think the issue is that HBase does not allow hive to impersonate users. So you'll need to setup hive as a proxy user in HBase. Can you try the following:
- Go to the HDFS service configuration in CM.
- Go to Service-Wide->Advanced and add the following to "Cluster-wide Configuration Safety Valve for core-site.xml":
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
I have added similar entry for OOZIE - HDFS Service-Wide Advanced properties and add the following to Cluster-wide ConfigurationSafety Valve for core-site.xml
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
Here is what i have in the logs -
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.MapReduceMain], main() threw exception, org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for supertemp,,99999999999999 after 10 tries.
java.lang.RuntimeException: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for supertemp,,99999999999999 after 10 tries.
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:206)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:1010)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:974)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:974)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:948)
at org.apache.oozie.action.hadoop.MapReduceMain.submitJob(MapReduceMain.java:97)
at org.apache.oozie.action.hadoop.MapReduceMain.run(MapReduceMain.java:57)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37)
at org.apache.oozie.action.hadoop.MapReduceMain.main(MapReduceMain.java:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:495)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for supertemp,,99999999999999 after 10 tries.
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:980)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:885)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:987)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:889)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:846)
at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:270)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:210)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:169)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:201)
... 26 more
Oozie Launcher failed, finishing Hadoop job gracefully
Created 05-05-2014 10:57 AM
Hello Clint, I followed the directions in an article told by Ryan Geno (Cloudera Employee) and it seems to have fixed the issue -
http://prodlife.wordpress.com/2013/11/22/using-oozie-in-kerberized-cluster/
- go to “Oozie service->Configuration->Oozie Server(default)->Advanced-> Oozie Server Configuration Safety Valve for oozie-site.xml” and add:
1 2 3 4 | <property> <name>oozie.credentials.credentialclasses</name> <value>hcat=org.apache.oozie.action.hadoop.HCatCredentials</value> </property> |
After I restarted the Oozie Service and ran the ResourceClipper Job – it completed….
Thanks Murthy
Created 04-21-2014 08:04 AM
Here is the information from the syslog -
the syslog shows hundreds of the following errors until the job dies:
2014-04-21 08:44:26,676 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-04-21 08:44:26,676 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)
2014-04-21 08:44:26,686 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
2014-04-21 08:44:26,686 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 2000ms before retry #1...
Created 04-23-2014 10:24 AM
Murthy,
If this is a CM-managed cluster, you need to deploy your HBase Client Configuration to any node that will be submitting jobs that will communicate with HBase. It looks like the machine where the job is running is trying to connect to a zookeeeper instance on it's local machine (the default) and it can't find zookeeper and therefore cannot communicate with HBase. The hbase-site.xml file loaded in this client's environment path will solve that.
Created 04-25-2014 11:30 AM
Hello Clint, Yes we are using CDH 4.6...
I have copied the hbase-site.xml to /user/oozie/share/lib/hbase folder in HDFS.
I got the error - i have restarted the cluster also... Here are some lines from the syslog -
2014-04-25 13:19:41,310 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
2014-04-25 13:19:41,331 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 42965@ldxhdfsw4.dx.deere.com
2014-04-25 13:19:41,337 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-04-25 13:19:41,339 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
Murthy
Created 05-05-2014 10:57 AM
Hello Clint, I followed the directions in an article told by Ryan Geno (Cloudera Employee) and it seems to have fixed the issue -
http://prodlife.wordpress.com/2013/11/22/using-oozie-in-kerberized-cluster/
- go to “Oozie service->Configuration->Oozie Server(default)->Advanced-> Oozie Server Configuration Safety Valve for oozie-site.xml” and add:
1 2 3 4 | <property> <name>oozie.credentials.credentialclasses</name> <value>hcat=org.apache.oozie.action.hadoop.HCatCredentials</value> </property> |
After I restarted the Oozie Service and ran the ResourceClipper Job – it completed….
Thanks Murthy
Created 05-05-2014 11:37 AM
Thanks for reporting the solution back to this thread, Murthy. Glad it's resolved!