Support Questions

Find answers, ask questions, and share your expertise

HBase regionserver is down during importtsv and gets rpcretry error

avatar
Contributor

I am trying to import data to hbase using importtsv. I tried couple of examples from online to learn and they worked fine.

Fx with below command I get the success and see the table on hbase shell.

bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns=HBASE_ROW_KEY,cf tab4 /user/hduser/gutenberg/simple1.txt
2016-06-20 17:08:03,138 INFO  [main] mapreduce.Job:  map 100% reduce 0%

2016-06-20 17:08:03,139 INFO  [main] mapreduce.Job: Job job_local1517188704_0001 completed successfully

2016-06-20 17:08:03,269 INFO  [main] mapreduce.Job: Counters: 24
   File System Counters
     FILE: Number of bytes read=4018816
     FILE: Number of bytes written=25538098

but the actual data I want to upload has lots of columns and rows. (237 to 400k respectively) To check out integrity I uploaded and tried to import a 237 column 10 row version of the data, to make it visible I am pasting the part of the data and the command.

1,date,serial,serial,date,bsrid,SW version,bsr Group Id,Processed kpi Number,reserved.... 2,20151206,1211080003,1211080003,20151206,103,30092,0,24,0,0,...

Here is a part of the command I am running.

bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns=HBASE_ROW_KEY,date,serial,serial_1,date_1,bsrid,SW_version,bsr_Group_Id,Processed_kpi_Number, (until 237th column) tab5 /user/hduser/KPI-Dumps/test.csv

When I try to run the command it gets the below status and when I type jps command I can't see the HRegionServer anymore and I have to restart HBase

2016-06-20 16:20:07,804 INFO  [communication thread] mapred.LocalJobRunner: map
2016-06-20 16:20:08,581 INFO  [main] mapreduce.Job:  map 100% reduce 0%
2016-06-20 16:21:15,159 INFO  [hconnection-0x1abc383-metaLookup-shared--pool5-t2] client.RpcRetryingCaller: Call exception, tries=10, retries=35, started=68390 ms ago, cancelled=false, msg=row 'tab5,1,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=cellops-optiplex-gx620,16201,1466432370372, seqNum=0
2016-06-20 16:21:35,227 INFO  [hconnection-0x1abc383-metaLookup-shared--pool5-t2] client.RpcRetryingCaller: Call exception, tries=11, retries=35, started=88487 ms ago, cancelled=false, msg=row 'tab5,1,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=cellops-optiplex-gx620,16201,1466432370372, seqNum=0

There is no typo or any missing comma or etc, double checked. My only guess is the difference of the table related to the column order is different, Hbase created the columns based on alphabetical order instead of the version I wanted it to create even though I gave the correct order based on the file I want to upload. Do you think it gets confused during upload?

From the created table

..

{NAME => 'Vs_SuccActiveCallReDirectUMTS'}

{NAME => 'bsr_Group_Id'} {NAME => 'bsrid'}

{NAME => 'date'} {NAME => 'date_1'}

{NAME => 'reserved'}

{NAME => 'reserved_1'}

{NAME => 'reserved_10'}

{NAME => 'reserved_11'}

...

hbase-site.xml is like this and its a pseudo distributed installation with zookeper also installed manually.

<configuration>
  <property>
  <name>hbase.rootdir</name>
  <value>/usr/local/hadoop/hbase</value>
  </property>
  <property>
  <name>hbase.zookeeper.property.dataDir</name>
  <value>/usr/local/Hbase/zookeeper</value>
  </property>
  <property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
  </property>
  <property>
  <name>zookeeper.znode.parent</name>
  <value>/hbase-unsecure</value>
  </property>
  <property>
  <name>hbase.zookeeper.quorum</name>
  <value>localhost</value>
  </property>
  <property>
  <name>hbase.zookeeper.property.clientPort</name>
  <value>2181</value>
  </property>
  <property>
  <name>hbase.rpc.timeout</name>
  <value>120000</value>
  </property>
  <property>
  <name>hbase.client.scanner.timeout.period</name>
  <value>120000</value>
  </property>
  <property>
  <name>hbase.cells.scanned.per.heartbeat.check</name>
  <value>10000</value>
  </property>
</configuration>
hduser@cellops-OptiPlex-GX620:/usr/local/Hbase$ jps
9712 NodeManager
31301 QuorumPeerMain
18294 HMaster
9481 ResourceManager
9289 SecondaryNameNode
18414 HRegionServer
9023 DataNode
8799 NameNode
18847 Jps
1 ACCEPTED SOLUTION

avatar
Super Guru

"When I try to run the command it gets the below status and when I type jps command I can't see the HRegionServer anymore and I have to restart HBase"

Have you looked at the RegionServer log to determine why it is no longer running? It sounds like something is causing your RegionServer to fail (perhaps, out of memory?) and then HBase cannot proceed because it requires at least one RegionServer.

Investigate the end of the RegionServer log to determine the failure.

View solution in original post

9 REPLIES 9

avatar
Super Guru

"When I try to run the command it gets the below status and when I type jps command I can't see the HRegionServer anymore and I have to restart HBase"

Have you looked at the RegionServer log to determine why it is no longer running? It sounds like something is causing your RegionServer to fail (perhaps, out of memory?) and then HBase cannot proceed because it requires at least one RegionServer.

Investigate the end of the RegionServer log to determine the failure.

avatar
Contributor

Yes actually I didn't think about it since I think its not much a big file to process either. But there is below output in the log. I think I need to change hbase-env.sh java heap space from 1G to 4G?

# java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError="kill -9 %p" # Executing /bin/sh -c "kill -9 18414"...

avatar
Super Guru

Yes, that looks like exactly what happened. 4G is a good heap size to start.

avatar
Master Collaborator

18414 was the region server process.

Was it still running ?

avatar
Contributor
9712 NodeManager
19811 Jps
31301 QuorumPeerMain
18294 HMaster
9481 ResourceManager
9289 SecondaryNameNode
9023 DataNode
8799 NameNode

nope. I mean this is after the execution of the command according to Josh's suggestion. It killed the process. I know I need to increase the heap size but the machine I am working on is not so powerful either. I will try to see with 4G

avatar
Master Collaborator

bq. Hbase created the columns based on alphabetical order

When you query hbase, you observe alphabetical order because that's what hbase stores internally.

Do you observe fewer than 10 rows after importing the sample data ?

avatar
Contributor

Yesterday I managed to import the sample data with success after increasing the heap size. Afterwards I stopped my hbase and hadoop instances and gave my ubuntu 12.04 a restart. After the restart hadoop doesn't come up. I now get below errors

namenode.log and datanode.log have same error

2016-06-21 10:11:12,347 WARN org.mortbay.log: failed jsp: java.lang.NoSuchFieldError: IS_SECURITY_ENABLED
2016-06-21 10:11:12,358 WARN org.mortbay.log: failed org.mortbay.jetty.webapp.WebAppContext@1599640{/,file:/usr/local/hadoop/share/hadoop/hdfs/webapps/hdfs}: java.lang.NoSuchFieldError: IS_SECURITY_ENABLED
2016-06-21 10:11:12,359 WARN org.mortbay.log: failed ContextHandlerCollection@181aa00: java.lang.NoSuchFieldError: IS_SECURITY_ENABLED
2016-06-21 10:11:12,360 ERROR org.mortbay.log: Error starting handlers
java.lang.NoSuchFieldError: IS_SECURITY_ENABLED
2016-06-21 10:11:12,401 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50070
2016-06-21 10:11:12,402 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2016-06-21 10:11:12,403 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2016-06-21 10:11:12,404 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2016-06-21 10:11:12,404 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.io.IOException: Problem in starting http server. Server handlers failed

secondarynamenode.log

2016-06-21 10:11:27,718 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: SecondaryNameNode metrics system started
2016-06-21 10:11:28,230 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /app/hadoop/tmp/dfs/namesecondary/in_use.lock acquired by nodename 10100@cellops-OptiPlex-GX620
2016-06-21 10:11:28,372 FATAL org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Failed to start secondary namenode
org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected version of storage directory /app/hadoop/tmp/dfs/namesecondary. Reported: -60. Expecting = -57.

resourcemanager.log

2016-06-21 10:11:29,608 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered UNIX signal handlers for [TERM, HUP, INT]
2016-06-21 10:11:30,505 INFO org.apache.hadoop.conf.Configuration: found resource core-site.xml at file:/usr/local/hadoop/etc/hadoop/core-site.xml
2016-06-21 10:11:30,675 INFO org.apache.hadoop.security.Groups: clearing userToGroupsMap cache
2016-06-21 10:11:30,784 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager
java.lang.NoSuchMethodError: org.apache.hadoop.yarn.server.utils.BuilderUtils.newApplicationResourceUsageReport(IILorg/apache/hadoop/yarn/api/records/Resource;Lorg/apache/hadoop/yarn/api/records/Resource;Lorg/apache/hadoop/yarn/api/records/Resource;JJ)Lorg/apache/hadoop/yarn/api/records/ApplicationResourceUsageReport;

nodemanager.log

2016-06-21 10:11:35,263 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.getSocketAddr(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;I)Ljava/net/InetSocketAddress;
   at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.serviceInit(ResourceLocalizationService.java:247)

I uninstalled and installed java back again. zookeeper is still running so I think its not java related. I tried with Hbase, it also seems working, regionserver is up master is down due its not able to reach hdfs. Somehow hadoop is crashed during the reboot. How can I repair it instead of installing again?

avatar
Super Guru

It looks like you somehow upgraded (some?) HDFS jars and messed up the Hadoop classpath. It could not load expected variables from the classpath. Additionally, it seems like the SecondaryNameNode is reporting that there is a newer filesystem layout (which would imply a newer version of HDFS was at one point running) and that it is expecting an older version (which implies that the SNN is using an older version of HDFS). Make sure you have consistent versions of HDFS installed.

avatar
Contributor

Hi Josh, yes I figured that out after comparing my log files with working one. There was an older version and as you told the path was mixed. Its fixed now. I have another problem during upload, its getting a timeout during my import of the big file but I guess its better to open another thread for that problem. Thanks a lot!