Created 11-22-2018 02:39 PM
Hello,
I have deployed a 3 node cluster using HDP 3.0.1. All the services are ok but i have a problem Timeline Service V2.0 stopped and the alert demonstrates the following message ATS embedded HBase is NOT running on host. I tried to restart YARN service but the job failed and i received that message
, details=row 'prod.timelineservice.entity' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=master.hadoop.example.com,17020,1542193490356, seqNum=-1 2018-11-21 14:14:50,497 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=21, retries=21, started=250953 ms ago, cancelled=false, msg=org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server master.hadoop.example.com,17020,1542809428882 is not running yet at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1487) at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2443) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) , details=row 'prod.timelineservice.entity' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=master.hadoop.example.com,17020,1542193490356, seqNum=-1
Any thoughts??
Created 05-03-2019 07:21 PM
I'm experiencing the same issue. It appears on a fresh cluster after I add a separate config group for datanodes. The datanode accidently got configured with the same memory limits as the master, which may have caused a out of memory error. Afterwards, I am unable to get the TimelineServicev2.0 to start (which is running on master).
I've tried reinstalling the cluster, but the same issue appears eventually. I don't have a solution yet.
Created 08-25-2019 12:42 AM
facing exactly same issue. please let me know how you have resolved this issue
Created 05-03-2019 09:28 PM
Digging deeper into the logs...
Heap
par new generation total 153024K, used 28803K [0x00000006f3400000, 0x00000006fda00000, 0x000000071cd90000)
eden space 136064K, 13% used [0x00000006f3400000, 0x00000006f4603360, 0x00000006fb8e0000)
from space 16960K, 61% used [0x00000006fb8e0000, 0x00000006fc2fda00, 0x00000006fc970000)
to space 16960K, 0% used [0x00000006fc970000, 0x00000006fc970000, 0x00000006fda00000)
concurrent mark-sweep generation total 339968K, used 8088K [0x000000071cd90000, 0x0000000731990000, 0x00000007c0000000)
Metaspace used 48711K, capacity 49062K, committed 49548K, reserved 1093632K
class space used 5568K, capacity 5682K, committed 5800K, reserved 1048576K
==> /var/log/hadoop-yarn/embedded-yarn-ats-hbase/hbase-yarn-ats-master-ip-11-0-1-167.log <==
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1457615217-11.0.1.167-1556905515643:blk_1073743532_2710 file=/atsv2/hbase/data/MasterProcWALs/pv2-00000000000000000002.log
at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:870)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:853)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:832)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:564)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:754)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:820)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:678)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:253)
at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:275)
at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:280)
at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
at org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessageV3.parseDelimitedWithIOException(GeneratedMessageV3.java:347)
at org.apache.hadoop.hbase.shaded.protobuf.generated.ProcedureProtos$ProcedureWALHeader.parseDelimitedFrom(ProcedureProtos.java:4707)
at org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.readHeader(ProcedureWALFormat.java:156)
at org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile.open(ProcedureWALFile.java:84)
at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.initOldLog(WALProcedureStore.java:1374)
... 8 more
2019-05-03 19:19:04,948 INFO [Thread-16] regionserver.HRegionServer: ***** STOPPING region server 'ip-11-0-1-167.us-east-2.compute.internal,17000,1556911123475' *****
2019-05-03 19:19:04,948 INFO [Thread-16] regionserver.HRegionServer: STOPPED: Stopped by Thread-16
Created 09-20-2019 04:36 AM
I have my RS up and running and still facing the same issue.
Created 09-20-2019 05:40 AM
it is related to memory insufficient
2019-08-14T10:25:35.393+0000: 6044.798: [GC (Allocation Failure) 2019-08-14T10:25:35.394+0000: 6044.798: [ParNew: 141353K->5541K(153344K), 0.0276284 secs] 197333K->61739K(3337600K), 0.0278554 secs] [Times: user=0.05 sys=0.00, real=0.03 secs] 2019-08-14T10:25:40.156+0000: 6049.560: [GC (Allocation Failure) 2019-08-14T10:25:40.157+0000: 6049.561: [ParNew: 141861K->4097K(153344K), 0.0249319 secs] 198059K->60298K(3337600K), 0.0264387 secs] [Times: user=0.05 sys=0.00, real=0.03 secs] 2019-08-14T10:27:55.744+0000: 6185.148: [GC (Allocation Failure) 2019-08-14T10:27:55.744+0000: 6185.149: [ParNew: 140417K->5321K(153344K), 0.0315498 secs] 196618K->61535K(3337600K), 0.0317389 secs] [Times: user=0.06 sys=0.00, real=0.03 secs] 2019-08-14T10:29:09.524+0000: 6258.928: [GC (Allocation Failure) 2019-08-14T10:29:09.524+0000: 6258.928: [ParNew: 141641K->6861K(153344K), 0.0333230 secs] 197855K->63260K(3337600K), 0.0334884 secs] [Times: user=0.07 sys=0.00, real=0.04 secs]
resolved it by moving the component to another node which has more free memory.