Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

ATS embedded HBase is NOT running on hosts

avatar
New Contributor

Hello,

I have deployed a 3 node cluster using HDP 3.0.1. All the services are ok but i have a problem Timeline Service V2.0 stopped and the alert demonstrates the following message ATS embedded HBase is NOT running on host. I tried to restart YARN service but the job failed and i received that message

, details=row 'prod.timelineservice.entity' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=master.hadoop.example.com,17020,1542193490356, seqNum=-1 2018-11-21 14:14:50,497 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=21, retries=21, started=250953 ms ago, cancelled=false, msg=org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server master.hadoop.example.com,17020,1542809428882 is not running yet at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1487) at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2443) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) , details=row 'prod.timelineservice.entity' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=master.hadoop.example.com,17020,1542193490356, seqNum=-1

Any thoughts??

5 REPLIES 5

avatar

I'm experiencing the same issue. It appears on a fresh cluster after I add a separate config group for datanodes. The datanode accidently got configured with the same memory limits as the master, which may have caused a out of memory error. Afterwards, I am unable to get the TimelineServicev2.0 to start (which is running on master).

I've tried reinstalling the cluster, but the same issue appears eventually. I don't have a solution yet.

avatar
New Contributor

facing exactly same issue. please let me know how you have resolved this issue

avatar

Digging deeper into the logs...


Heap

par new generation total 153024K, used 28803K [0x00000006f3400000, 0x00000006fda00000, 0x000000071cd90000)

eden space 136064K, 13% used [0x00000006f3400000, 0x00000006f4603360, 0x00000006fb8e0000)

from space 16960K, 61% used [0x00000006fb8e0000, 0x00000006fc2fda00, 0x00000006fc970000)

to space 16960K, 0% used [0x00000006fc970000, 0x00000006fc970000, 0x00000006fda00000)

concurrent mark-sweep generation total 339968K, used 8088K [0x000000071cd90000, 0x0000000731990000, 0x00000007c0000000)

Metaspace used 48711K, capacity 49062K, committed 49548K, reserved 1093632K

class space used 5568K, capacity 5682K, committed 5800K, reserved 1048576K

==> /var/log/hadoop-yarn/embedded-yarn-ats-hbase/hbase-yarn-ats-master-ip-11-0-1-167.log <==

at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1457615217-11.0.1.167-1556905515643:blk_1073743532_2710 file=/atsv2/hbase/data/MasterProcWALs/pv2-00000000000000000002.log

at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:870)

at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:853)

at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:832)

at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:564)

at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:754)

at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:820)

at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:678)

at java.io.FilterInputStream.read(FilterInputStream.java:83)

at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:253)

at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:275)

at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:280)

at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)

at org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessageV3.parseDelimitedWithIOException(GeneratedMessageV3.java:347)

at org.apache.hadoop.hbase.shaded.protobuf.generated.ProcedureProtos$ProcedureWALHeader.parseDelimitedFrom(ProcedureProtos.java:4707)

at org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.readHeader(ProcedureWALFormat.java:156)

at org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile.open(ProcedureWALFile.java:84)

at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.initOldLog(WALProcedureStore.java:1374)

... 8 more

2019-05-03 19:19:04,948 INFO [Thread-16] regionserver.HRegionServer: ***** STOPPING region server 'ip-11-0-1-167.us-east-2.compute.internal,17000,1556911123475' *****

2019-05-03 19:19:04,948 INFO [Thread-16] regionserver.HRegionServer: STOPPED: Stopped by Thread-16

avatar
Contributor

I have my RS up and running and still facing the same issue.

avatar
Contributor

it is related to memory insufficient

2019-08-14T10:25:35.393+0000: 6044.798: [GC (Allocation Failure) 2019-08-14T10:25:35.394+0000: 6044.798: [ParNew: 141353K->5541K(153344K), 0.0276284 secs] 197333K->61739K(3337600K), 0.0278554 secs] [Times: user=0.05 sys=0.00, real=0.03 secs] 
2019-08-14T10:25:40.156+0000: 6049.560: [GC (Allocation Failure) 2019-08-14T10:25:40.157+0000: 6049.561: [ParNew: 141861K->4097K(153344K), 0.0249319 secs] 198059K->60298K(3337600K), 0.0264387 secs] [Times: user=0.05 sys=0.00, real=0.03 secs] 
2019-08-14T10:27:55.744+0000: 6185.148: [GC (Allocation Failure) 2019-08-14T10:27:55.744+0000: 6185.149: [ParNew: 140417K->5321K(153344K), 0.0315498 secs] 196618K->61535K(3337600K), 0.0317389 secs] [Times: user=0.06 sys=0.00, real=0.03 secs] 
2019-08-14T10:29:09.524+0000: 6258.928: [GC (Allocation Failure) 2019-08-14T10:29:09.524+0000: 6258.928: [ParNew: 141641K->6861K(153344K), 0.0333230 secs] 197855K->63260K(3337600K), 0.0334884 secs] [Times: user=0.07 sys=0.00, real=0.04 secs]

resolved it by moving the component to another node which has more free memory.