Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

YARN Timeline Service can not start anymore due to HBase problem?

avatar

Hello,

YARN Timeline Service Reader is not starting anymore due to following error:

2018-12-08 12:59:18,852 INFO  [main] client.RpcRetryingCallerImpl: Call exception, tries=6, retries=6, started=4859 ms ago, cancelled=false, msg=Call to examples.foodscience-01.de/163.49.39.115:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: examples.foodscience-01.de/163.49.39.115:17020, details=row 'prod.timelineservice.entity' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=examples.foodscience-01.de,17020,1543619998977, seqNum=-1
2018-12-08 12:59:22,895 INFO  [main] client.RpcRetryingCallerImpl: Call exception, tries=7, retries=7, started=8902 ms ago, cancelled=false, msg=Call to examples.foodscience-01.de/163.49.39.115:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: examples.foodscience-01.de/163.49.39.115:17020, details=row 'prod.timelineservice.entity' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=examples.foodscience-01.de,17020,1543619998977, seqNum=-1
2018-12-08 12:59:32,955 INFO  [main] client.RpcRetryingCallerImpl: Call exception, tries=8, retries=8, started=18962 ms ago, cancelled=false, msg=Call to examples.foodscience-01.de/163.49.39.115:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: examples.foodscience-01.de/163.49.39.115:17020, details=row 'prod.timelineservice.entity' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=examples.foodscience-01.de,17020,1543619998977, seqNum=-1
2018-12-08 12:59:42,965 INFO  [main] client.RpcRetryingCallerImpl: Call exception, tries=9, retries=9, started=28972 ms ago, cancelled=false, msg=Call to examples.foodscience-01.de/163.49.39.115:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: examples.foodscience-01.de/163.49.39.115:17020, details=row 'prod.timelineservice.entity' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=examples.foodscience-01.de,17020,1543619998977, seqNum=-1
2018-12-08 12:59:53,064 INFO  [main] client.RpcRetryingCallerImpl: Call exception, tries=10, retries=10, started=39071 ms ago, cancelled=false, msg=Call to examples.foodscience-01.de/163.49.39.115:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: examples.foodscience-01.de/163.49.39.115:17020, details=row 'prod.timelineservice.entity' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=examples.foodscience-01.de,17020,1543619998977, seqNum=-1
2018-12-08 13:00:03,101 INFO  [main] client.RpcRetryingCallerImpl: Call exception, tries=11, retries=11, started=49108 ms ago, cancelled=false, msg=Call to examples.foodscience-01.de/163.49.39.115:17020 failed on connection exception: org.apache.hbase.th

It seems that HBase has a problem (although I am not using this service on Ambari).

Then I checked following log file hadoop-yarn-timelinereader-foodscience-01.log

Caused by: java.net.ConnectException: Call to examples.foodscience-01.de/163.49.39.115:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: examples.foodscience-01.de/163.49.39.115:17020
    at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:165)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406)
    at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
    at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
    at org.apache.hadoop.hbase.ipc.BufferCallBeforeInitHandler.userEventTriggered(BufferCallBeforeInitHandler.java:92)
    at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
    at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315)
    at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:307)
    at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.userEventTriggered(DefaultChannelPipeline.java:1377)
    at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
    at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315)
    at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireUserEventTriggered(DefaultChannelPipeline.java:929)
    at org.apache.hadoop.hbase.ipc.NettyRpcConnection.failInit(NettyRpcConnection.java:179)
    at org.apache.hadoop.hbase.ipc.NettyRpcConnection.access$500(NettyRpcConnection.java:71)
    at org.apache.hadoop.hbase.ipc.NettyRpcConnection$3.operationComplete(NettyRpcConnection.java:269)
    at org.apache.hadoop.hbase.ipc.NettyRpcConnection$3.operationComplete(NettyRpcConnection.java:263)
    at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
    at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500)
    at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479)
    at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
    at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122)
    at org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327)
    at org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343)
    at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
    at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
    at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
    at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
    at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    ... 1 more
Caused by: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: examples.foodscience-01.de/163.49.39.115:17020
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hbase.thirdparty.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
    at org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
    ... 7 more
Caused by: java.net.ConnectException: Connection refused
    ... 11 more
2018-12-06 13:03:33,051 INFO  zookeeper.ReadOnlyZKClient (ReadOnlyZKClient.java:run(315)) - 0x4d465b11 no activities for 60000 ms, close active connection. Will reconnect next time when there are new requests.
2018-12-06 13:03:57,614 INFO  storage.HBaseTimelineReaderImpl (HBaseTimelineReaderImpl.java:run(170)) - Running HBase liveness monitor
2018-12-06 13:04:24,100 ERROR reader.TimelineReaderServer (LogAdapter.java:error(75)) - RECEIVED SIGNAL 15: SIGTERM
2018-12-06 13:04:24,116 INFO  handler.ContextHandler (ContextHandler.java:doStop(910)) - Stopped o.e.j.w.WebAppContext@12299890{/,null,UNAVAILABLE}{/timeline}
2018-12-06 13:04:24,125 INFO  server.AbstractConnector (AbstractConnector.java:doStop(318)) - Stopped ServerConnector@328af33d{HTTP/1.1,[http/1.1]}{0.0.0.0:8198}
2018-12-06 13:04:24,128 INFO  handler.ContextHandler (ContextHandler.java:doStop(910)) - Stopped o.e.j.s.ServletContextHandler@7d3e8655{/static,jar:file:/usr/hdp/3.0.0.0-1634/hadoop-yarn/hadoop-yarn-common-3.1.0.3.0.0.0-1634.jar!/webapps/static,UNAVAILABLE}
2018-12-06 13:04:24,128 INFO  handler.ContextHandler (ContextHandler.java:doStop(910)) - Stopped o.e.j.s.ServletContextHandler@7dfd3c81{/logs,file:///var/log/hadoop-yarn/yarn/,UNAVAILABLE}
2018-12-06 13:04:24,142 INFO  storage.HBaseTimelineReaderImpl (HBaseTimelineReaderImpl.java:serviceStop(108)) - closing the hbase Connection
2018-12-06 13:04:24,143 INFO  zookeeper.ReadOnlyZKClient (ReadOnlyZKClient.java:close(342)) - Close zookeeper connection 0x4d465b11 to examples.foodscience-01.de:2181,examples.foodscience-02.de:2181,examples.foodscience-03.de:2181
2018-12-06 13:04:24,143 WARN  storage.HBaseTimelineReaderImpl (HBaseTimelineReaderImpl.java:run(183)) - Got failure attempting to read from timeline storage, assuming HBase down
java.io.UncheckedIOException: java.io.InterruptedIOException
    at org.apache.hadoop.hbase.client.ResultScanner$1.hasNext(ResultScanner.java:55)
    at org.apache.hadoop.yarn.server.timelineservice.storage.reader.TimelineEntityReader.readEntities(TimelineEntityReader.java:283)
    at org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl$HBaseMonitor.run(HBaseTimelineReaderImpl.java:174)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.InterruptedIOException
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:246)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
    at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192)
    at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:269)
    at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437)
    at org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312)
    at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597)
    at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:834)
    at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732)
    at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:325)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
    at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192)
    at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:269)
    at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437)
    at org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312)
    at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597)
    at org.apache.hadoop.hbase.client.ResultScanner$1.hasNext(ResultScanner.java:53)
    ... 9 more
2018-12-06 13:04:24,153 INFO  zookeeper.ReadOnlyZKClient (ReadOnlyZKClient.java:close(342)) - Close zookeeper connection 0x5b7a5baa to examples.foodscience-01.de:2181,examples.foodscience-02.de:2181,examples.foodscience-03.de:2181
2018-12-06 13:04:24,155 INFO  reader.TimelineReaderServer (LogAdapter.java:info(51)) - SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down TimelineReaderServer at examples.foodscience-01.de/163.49.39.115

I dont know why this error appears when starting the timeline service. How can this be fixed?

5 REPLIES 5

avatar
Master Mentor

@Andreas Kühnert

Can you share your current architecture single /multi-node? Is your hbase master and regions server up and running? If multi-node havd you installed YARN client of the region server? If not please install the YARN client and retry.

avatar

It s a multi node cluster (4 data nodes). I didn't install HBase on Ambari.

I installed YARN service with Ambari and the Timeline Service worked until few days ago where this error emerged.

How can I check if YARN client of region server is installed?

avatar
Master Mentor

@Andreas Kühnert

You will need to have HBase installed because the error snippet

"details=row 'prod.timelineservice.entity' on table 'hbase:meta' at region=hbase:meta,"

YARN Timeline Service v.2 uses a set of collectors (writers) to write data to the backend storage it uses Apache HBase as the primary backing storage, as Apache HBase scales well to a large size while maintaining good response times for reads and writes. The collectors are distributed and co-located with the application masters to which they are dedicated.

On the nodes running the region servers, you should be able to see the software installed on that node.

HTH


avatar

Maybe we can discuss this in more detail in a skype session?

Do you have contact data?

avatar
New Contributor

Hi Andreas Kühnert

 

We are facing same issue, we don't want to install hbase as our VM's as we don't have capacity to handle HDFS and Hbase on same cluster. Do we have any work around for this 

 

thanks in advance for the suggestion 

 

Ram