Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

During an import of Hbase using importtsv, hdfs is getting a timeout from zookeeper

Highlighted

During an import of Hbase using importtsv, hdfs is getting a timeout from zookeeper

New Contributor

I am trying to import a hdfs file to hbase using importtsv. Its a 237 Colum ~400K Row csv file. I tried to upload a sample of the file with 10 rows and successfully uploaded to my already created 237 column hbase table quickly. But during the import of my actual file I am getting timeout error and I think I need tuning or extra configuration to keep the heartbeat.

Sorry for the long log but I pasted the parts I think are relevant. Its the output of regionserver.log

2016-06-22 10:37:13,044 WARN  [MemStoreFlusher.0] hfile.HFileWriterV2: A minimum HFile version of 3 is required to support cell attributes/tags. Consider setting hfile.format.version accordingly.2016-06-22 10:37:29,384 INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1781msGC pool 'ParNew' had collection(s): count=10 time=1985ms2016-06-22 10:38:28,536 INFO  [sync.1] wal.FSHLog: Slow sync cost: 1576 ms, current pipeline: [127.0.0.1:50010]2016-06-22 10:41:13,555 INFO  [MemStoreFlusher.0] regionserver.HRegion: Finished memstore flush of ~128.67 MB/134924816, currentsize=314.16 MB/329418600 for region KPIKPI,,1466582824013.7e506584086a6599d0f3922df5af23d2. in 241906ms, sequenceid=449, compaction requested=false2016-06-22 10:41:14,433 INFO  [MemStoreFlusher.1] regionserver.HRegion: Started memstore flush for KPIKPI,,1466582824013.7e506584086a6599d0f3922df5af23d2., current region memstore size 315.35 MB, and 237/237 column families' memstores are being flushed.2016-06-22 10:46:45,861 WARN  [regionserver/cellops-OptiPlex-GX620/127.0.1.1:16201] util.Sleeper: We slept 325086ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired2016-06-22 10:46:46,573 WARN  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 325530msGC pool 'ParNew' had collection(s): count=6 time=614msGC pool 'ConcurrentMarkSweep' had collection(s): count=2 time=323168ms2016-06-22 10:47:37,258 INFO  [regionserver/cellops-OptiPlex-GX620/127.0.1.1:16201-SendThread(localhost:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 330048ms for sessionid 0x155739733cb0020, closing socket connection and attempting reconnect2016-06-22 10:47:37,259 INFO  [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 330052ms for sessionid 0x155739733cb001f, closing socket connection and attempting reconnect2016-06-22 10:47:37,258 INFO  [regionserver/cellops-OptiPlex-GX620/127.0.1.1:16201-SendThread(localhost:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 330048ms for sessionid 0x155739733cb0020, closing socket connection and attempting reconnect2016-06-22 10:47:37,259 INFO  [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 330052ms for sessionid 0x155739733cb001f, closing socket connection and attempting reconnect2016-06-22 10:47:37,357 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=371.27 KB, freeSize=608.94 MB, max=609.30 MB, blockCount=4, accesses=22, hits=17, hitRatio=77.27%, , cachingAccesses=22, cachingHits=17, cachingHitsRatio=77.27%, evictions=32, evicted=0, evictedPerRun=0.02016-06-22 10:47:40,771 INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1014msNo GCs detected2016-06-22 10:47:40,826 INFO  [cellops-optiplex-gx620,16201,1466584559663_ChoreService_1] regionserver.HRegionServer$CompactionChecker: Chore: CompactionChecker missed its start time2016-06-22 10:47:40,873 INFO  [cellops-optiplex-gx620,16201,1466584559663_ChoreService_2] regionserver.HRegionServer$PeriodicMemstoreFlusher: Chore: cellops-optiplex-gx620,16201,1466584559663-MemstoreFlusherChore missed its start time2016-06-22 10:47:40,874 INFO  [cellops-optiplex-gx620,16201,1466584559663_ChoreService_1] regionserver.ServerNonceManager$1: Chore: nonceCleaner missed its start time2016-06-22 10:47:40,874 INFO  [cellops-optiplex-gx620,16201,1466584559663_ChoreService_2] regionserver.HeapMemoryManager$HeapMemoryTunerChore: Chore: cellops-optiplex-gx620,16201,1466584559663-HeapMemoryTunerChore missed its start time2016-06-22 10:47:40,874 INFO  [cellops-optiplex-gx620,16201,1466584559663_ChoreService_3] regionserver.HRegionServer$MovedRegionsCleaner: Chore: MovedRegionsCleaner for region cellops-optiplex-gx620,16201,1466584559663 missed its start time2016-06-22 10:47:50,009 INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 8730msGC pool 'ParNew' had collection(s): count=1 time=8477ms2016-06-22 10:47:50,052 INFO  [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)2016-06-22 10:47:50,052 INFO  [regionserver/cellops-OptiPlex-GX620/127.0.1.1:16201-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)2016-06-22 10:47:50,053 INFO  [regionserver/cellops-OptiPlex-GX620/127.0.1.1:16201-SendThread(localhost:2181)] zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session2016-06-22 10:47:50,235 INFO  [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session2016-06-22 10:48:00,588 INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 9578msGC pool 'ParNew' had collection(s): count=1 time=9222ms2016-06-22 10:48:05,308 INFO  [regionserver/cellops-OptiPlex-GX620/127.0.1.1:16201-SendThread(localhost:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x155739733cb0020 has expired, closing socket connection2016-06-22 10:48:11,711 INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 4183msNo GCs detected2016-06-22 10:48:13,792 INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1080msNo GCs detected2016-06-22 10:48:15,531 INFO  [cellops-optiplex-gx620,16201,1466584559663_ChoreService_4] regionserver.HRegionServer: cellops-optiplex-gx620,16201,1466584559663-MemstoreFlusherChore requesting flush for region hbase:meta,,1.1588230740 after a delay of 121562016-06-22 10:48:15,666 INFO  [cellops-optiplex-gx620,16201,1466584559663_ChoreService_4] regionserver.HRegionServer$PeriodicMemstoreFlusher: Chore: cellops-optiplex-gx620,16201,1466584559663-MemstoreFlusherChore missed its start time2016-06-22 10:48:16,225 INFO  [cellops-optiplex-gx620,16201,1466584559663_ChoreService_2] regionserver.HRegionServer: cellops-optiplex-gx620,16201,1466584559663-MemstoreFlusherChore requesting flush for region hbase:meta,,1.1588230740 after a delay of 199362016-06-22 10:48:17,506 WARN  [PriorityRpcServer.handler=8,queue=0,port=16201] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1466585258530,"responsesize":12,"method":"Scan","processingtimems":28158,"client":"127.0.0.1:43701","queuetimems":1502,"class":"HRegionServer"}2016-06-22 10:48:17,506 WARN  [PriorityRpcServer.handler=7,queue=1,port=16201] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1466585258531,"responsesize":12,"method":"Scan","processingtimems":28157,"client":"127.0.0.1:43701","queuetimems":49388,"class":"HRegionServer"}2016-06-22 10:48:17,506 WARN  [PriorityRpcServer.handler=14,queue=0,port=16201] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1466585270681,"responsesize":8,"method":"Scan","processingtimems":14671,"client":"127.0.0.1:44250","queuetimems":546,"class":"HRegionServer"}2016-06-22 10:48:17,506 WARN  [PriorityRpcServer.handler=6,queue=0,port=16201] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1466585257490,"responsesize":12,"method":"Scan","processingtimems":29198,"client":"127.0.0.1:43701","queuetimems":462,"class":"HRegionServer"}2016-06-22 10:48:17,507 WARN  [PriorityRpcServer.handler=10,queue=0,port=16201] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1466585270294,"responsesize":8,"method":"Scan","processingtimems":15058,"client":"127.0.0.1:44250","queuetimems":9558,"class":"HRegionServer"}2016-06-22 10:48:17,507 WARN  [PriorityRpcServer.handler=12,queue=0,port=16201] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1466585270680,"responsesize":8,"method":"Scan","processingtimems":14672,"client":"127.0.0.1:44250","queuetimems":677,"class":"HRegionServer"}2016-06-22 10:48:17,507 WARN  [PriorityRpcServer.handler=9,queue=1,port=16201] ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1466585270203,"responsesize":12,"method":"Scan","processingtimems":16485,"client":"127.0.0.1:43701","queuetimems":13175,"class":"HRegionServer"}2016-06-22 10:48:39,588 INFO  [cellops-optiplex-gx620,16201,1466584559663_ChoreService_1] regionserver.HRegionServer$PeriodicMemstoreFlusher: Chore: cellops-optiplex-gx620,16201,1466584559663-MemstoreFlusherChore missed its start time2016-06-22 10:48:39,588 INFO  [cellops-optiplex-gx620,16201,1466584559663_ChoreService_1] regionserver.HRegionServer$PeriodicMemstoreFlusher: Chore: cellops-optiplex-gx620,16201,1466584559663-MemstoreFlusherChore missed its start time2016-06-22 10:48:39,590 INFO  [cellops-optiplex-gx620,16201,1466584559663_ChoreService_2] regionserver.HRegionServer: cellops-optiplex-gx620,16201,1466584559663-MemstoreFlusherChore requesting flush for region hbase:meta,,1.1588230740 after a delay of 114982016-06-22 10:48:39,841 WARN  [ResponseProcessor for block BP-1792253545-127.0.1.1-1466519031864:blk_1073742365_1544] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-1792253545-127.0.1.1-1466519031864:blk_1073742365_1544java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:38155 remote=/127.0.0.1:50010]2016-06-22 10:48:43,447 INFO  [MemStoreFlusher.0] regionserver.HRegion: Started memstore flush for hbase:meta,,1.1588230740, current region memstore size 1.75 KB, and 1/1 column families' memstores are being flushed.2016-06-22 10:48:40,022 INFO  [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 49787ms for sessionid 0x155739733cb001f, closing socket connection and attempting reconnect2016-06-22 10:48:39,841 WARN  [ResponseProcessor for block BP-1792253545-127.0.1.1-1466519031864:blk_1073742557_1736] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-1792253545-127.0.1.1-1466519031864:blk_1073742557_1736java.io.EOFException: Premature EOF: no length prefix available2016-06-22 10:48:39,589 INFO  [cellops-optiplex-gx620,16201,1466584559663_ChoreService_4] regionserver.HRegionServer$CompactionChecker: Chore: CompactionChecker missed its start time2016-06-22 10:48:39,841 WARN  [regionserver/cellops-OptiPlex-GX620/127.0.1.1:16201-EventThread] client.ConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, closing it. It will be recreated next time someone needs itorg.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired2016-06-22 10:48:44,463 INFO  [regionserver/cellops-OptiPlex-GX620/127.0.1.1:16201-EventThread] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x155739733cb00202016-06-22 10:48:44,463 INFO  [regionserver/cellops-OptiPlex-GX620/127.0.1.1:16201-EventThread] zookeeper.ClientCnxn: EventThread shut down2016-06-22 10:48:44,966 WARN  [RS_OPEN_META-cellops-OptiPlex-GX620:16201-0.append-pool3-t1] wal.FSHLog: Append sequenceId=56, requesting roll of WALjava.io.IOException: All datanodes 127.0.0.1:50010 are bad. Aborting...2016-06-22 10:48:46,716 INFO  [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)2016-06-22 10:48:47,230 INFO  [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session2016-06-22 10:48:47,232 INFO  [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x155739733cb001f has expired, closing socket connection2016-06-22 10:48:47,280 FATAL [main-EventThread] regionserver.HRegionServer: ABORTING region server cellops-optiplex-gx620,16201,1466584559663: regionserver:16201-0x155739733cb001f, quorum=localhost:2181, baseZNode=/hbase-unsecure regionserver:16201-0x155739733cb001f received expired from ZooKeeper, abortingorg.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired2016-06-22 10:48:47,280 FATAL [main-EventThread] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]2016-06-22 10:48:47,531 ERROR [MemStoreFlusher.0] regionserver.MemStoreFlusher: Cache flush failed for region hbase:meta,,1org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: Append sequenceId=56, requesting roll of WAL  at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:2190)  at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:2027)  at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1949)  at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)  at java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: All datanodes 127.0.0.1:50010 are bad. Aborting...  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1137)  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:933)  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:487)2016-06-22 10:48:51,364 ERROR [MemStoreFlusher.0] regionserver.MemStoreFlusher: Cache flush failed for region hbase:meta,,1java.io.IOException: Aborting flush because server is aborted...  at org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2101)  at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2085)  at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2050)  at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1942)  at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1868)  at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510)  at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)  at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75)  at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)  at java.lang.Thread.run(Thread.java:745)2016-06-22 10:49:22,915 FATAL [regionserver/cellops-OptiPlex-GX620/127.0.1.1:16201.logRoller] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]2016-06-22 10:49:22,918 FATAL [RS_OPEN_META-cellops-OptiPlex-GX620:16201-0-MetaLogRoller] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]2016-06-22 10:49:24,300 INFO  [regionserver/cellops-OptiPlex-GX620/127.0.1.1:16201.logRoller] regionserver.HRegionServer: Dump of metrics as JSON on abort: {  "beans" : [ {  "name" : "java.lang:type=Memory",  "modelerType" : "sun.management.MemoryImpl",  "HeapMemoryUsage" : {  "committed" : 1597243392,  "init" : 16777216,  "max" : 1597243392,  "used" : 1186946872  },  "Verbose" : false,  "ObjectPendingFinalizationCount" : 0,  "NonHeapMemoryUsage" : {  "committed" : 29319168,  "init" : 163840,  "max" : -1,  "used" : 28126760  },  "ObjectName" : "java.lang:type=Memory"  } ],............. (Some Extra Information like above)............2016-06-22 10:49:34,322 FATAL [regionserver/cellops-OptiPlex-GX620/127.0.1.1:16201] regionserver.HRegionServer: ABORTING region server cellops-optiplex-gx620,16201,1466584559663: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing cellops-optiplex-gx620,16201,1466584559663 as dead server2016-06-22 10:49:35,571 WARN  [regionserver/cellops-OptiPlex-GX620/127.0.1.1:16201.logRoller] regionserver.HRegionServer: Unable to report fatal error to mastercom.google.protobuf.ServiceException: java.io.IOException: Call to cellops-OptiPlex-GX620/127.0.1.1:16000 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=91, waitTime=10068, operationTimeout=10000 expired2016-06-22 10:49:35,645 INFO  [main-EventThread] regionserver.HRegionServer: STOPPED: regionserver:16201-0x155739733cb001f, quorum=localhost:2181, baseZNode=/hbase-unsecure regionserver:16201-0x155739733cb001f received expired from ZooKeeper, aborting

It used to be a HRegionServer but it seems closed after above errors.

hduser@cellops-OptiPlex-GX620:/usr/local/Hbase/logs$ jps2704 DataNode2481 NameNode3700 QuorumPeerMain3400 NodeManager2969 SecondaryNameNode3164 ResourceManager20109 HMaster24750 Jps

My Hbase-Core.site.xml

<configuration>  <property>  <name>hbase.rootdir</name>  <value>/usr/local/hadoop/hbase</value>  </property>  <property>  <name>hbase.zookeeper.property.dataDir</name>  <value>/usr/local/Hbase/zookeeper</value>  </property>  <property>  <name>hbase.cluster.distributed</name>  <value>true</value>  </property>  <property>  <name>zookeeper.znode.parent</name>  <value>/hbase-unsecure</value>  </property>  <property>  <name>hbase.zookeeper.quorum</name>  <value>localhost</value>  </property>  <property>  <name>hbase.zookeeper.property.clientPort</name>  <value>2181</value>  </property>  <property>  <name>hbase.rpc.timeout</name>  <value>120000</value>  </property>  <property>  <name>hbase.client.scanner.timeout.period</name>  <value>120000</value>  </property>  <property>  <name>hbase.cells.scanned.per.heartbeat.check</name>  <value>10000</value>  </property></configuration>

Do I need to change a timer either at zoo.cfg or above or maybe decrease the split size of mapreduce? It was at 3% when the error happened.

4 REPLIES 4

Re: During an import of Hbase using importtsv, hdfs is getting a timeout from zookeeper

Looks like region server is suffering from Long GC pause issue, please increase the region server heap size first and see if that resolve the issue, Also check node resources like CPU,RAM utilization on that node. Also try increasing zookeeper timeout value.

<property>
  <name>zookeeper.session.timeout</name>
  <value>1200000</value>
</property>
<property>
  <name>hbase.zookeeper.property.tickTime</name>
  <value>6000</value> 

</property>

Re: During an import of Hbase using importtsv, hdfs is getting a timeout from zookeeper

2BG on host of less when running DN and Hbase on same machine, How many hbase region servers you have?

I would suggest to add more resource into the cluster or try increasing zookeeper timeout.

Re: During an import of Hbase using importtsv, hdfs is getting a timeout from zookeeper

New Contributor

I increased the heap size to 1536Mb since my RAM is only 2GB on the machine I am practicing. I can increase the timeout of course, maybe double? But I am guessing if there is another way to increase splits and use less RAM.

Re: During an import of Hbase using importtsv, hdfs is getting a timeout from zookeeper

Super Collaborator

@Erdal Kucuk

Increasing timeout alone wouldn't solve the problem you face.

Can you get machine with more memory ?

1536Mb is still quite low.

Don't have an account?
Coming from Hortonworks? Activate your account here