Created 12-12-2018 12:30 AM
Hello,
YARN Timeline Service Reader is not starting anymore due to following error:
2018-12-08 12:59:18,852 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=6, retries=6, started=4859 ms ago, cancelled=false, msg=Call to examples.foodscience-01.de/163.49.39.115:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: examples.foodscience-01.de/163.49.39.115:17020, details=row 'prod.timelineservice.entity' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=examples.foodscience-01.de,17020,1543619998977, seqNum=-1 2018-12-08 12:59:22,895 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=7, retries=7, started=8902 ms ago, cancelled=false, msg=Call to examples.foodscience-01.de/163.49.39.115:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: examples.foodscience-01.de/163.49.39.115:17020, details=row 'prod.timelineservice.entity' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=examples.foodscience-01.de,17020,1543619998977, seqNum=-1 2018-12-08 12:59:32,955 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=8, retries=8, started=18962 ms ago, cancelled=false, msg=Call to examples.foodscience-01.de/163.49.39.115:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: examples.foodscience-01.de/163.49.39.115:17020, details=row 'prod.timelineservice.entity' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=examples.foodscience-01.de,17020,1543619998977, seqNum=-1 2018-12-08 12:59:42,965 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=9, retries=9, started=28972 ms ago, cancelled=false, msg=Call to examples.foodscience-01.de/163.49.39.115:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: examples.foodscience-01.de/163.49.39.115:17020, details=row 'prod.timelineservice.entity' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=examples.foodscience-01.de,17020,1543619998977, seqNum=-1 2018-12-08 12:59:53,064 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=10, retries=10, started=39071 ms ago, cancelled=false, msg=Call to examples.foodscience-01.de/163.49.39.115:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: examples.foodscience-01.de/163.49.39.115:17020, details=row 'prod.timelineservice.entity' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=examples.foodscience-01.de,17020,1543619998977, seqNum=-1 2018-12-08 13:00:03,101 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=11, retries=11, started=49108 ms ago, cancelled=false, msg=Call to examples.foodscience-01.de/163.49.39.115:17020 failed on connection exception: org.apache.hbase.th
It seems that HBase has a problem (although I am not using this service on Ambari).
Then I checked following log file hadoop-yarn-timelinereader-foodscience-01.log
Caused by: java.net.ConnectException: Call to examples.foodscience-01.de/163.49.39.115:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: examples.foodscience-01.de/163.49.39.115:17020 at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:165) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406) at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103) at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118) at org.apache.hadoop.hbase.ipc.BufferCallBeforeInitHandler.userEventTriggered(BufferCallBeforeInitHandler.java:92) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:307) at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.userEventTriggered(DefaultChannelPipeline.java:1377) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315) at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireUserEventTriggered(DefaultChannelPipeline.java:929) at org.apache.hadoop.hbase.ipc.NettyRpcConnection.failInit(NettyRpcConnection.java:179) at org.apache.hadoop.hbase.ipc.NettyRpcConnection.access$500(NettyRpcConnection.java:71) at org.apache.hadoop.hbase.ipc.NettyRpcConnection$3.operationComplete(NettyRpcConnection.java:269) at org.apache.hadoop.hbase.ipc.NettyRpcConnection$3.operationComplete(NettyRpcConnection.java:263) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122) at org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327) at org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343) at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) ... 1 more Caused by: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: examples.foodscience-01.de/163.49.39.115:17020 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hbase.thirdparty.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) at org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ... 7 more Caused by: java.net.ConnectException: Connection refused ... 11 more 2018-12-06 13:03:33,051 INFO zookeeper.ReadOnlyZKClient (ReadOnlyZKClient.java:run(315)) - 0x4d465b11 no activities for 60000 ms, close active connection. Will reconnect next time when there are new requests. 2018-12-06 13:03:57,614 INFO storage.HBaseTimelineReaderImpl (HBaseTimelineReaderImpl.java:run(170)) - Running HBase liveness monitor 2018-12-06 13:04:24,100 ERROR reader.TimelineReaderServer (LogAdapter.java:error(75)) - RECEIVED SIGNAL 15: SIGTERM 2018-12-06 13:04:24,116 INFO handler.ContextHandler (ContextHandler.java:doStop(910)) - Stopped o.e.j.w.WebAppContext@12299890{/,null,UNAVAILABLE}{/timeline} 2018-12-06 13:04:24,125 INFO server.AbstractConnector (AbstractConnector.java:doStop(318)) - Stopped ServerConnector@328af33d{HTTP/1.1,[http/1.1]}{0.0.0.0:8198} 2018-12-06 13:04:24,128 INFO handler.ContextHandler (ContextHandler.java:doStop(910)) - Stopped o.e.j.s.ServletContextHandler@7d3e8655{/static,jar:file:/usr/hdp/3.0.0.0-1634/hadoop-yarn/hadoop-yarn-common-3.1.0.3.0.0.0-1634.jar!/webapps/static,UNAVAILABLE} 2018-12-06 13:04:24,128 INFO handler.ContextHandler (ContextHandler.java:doStop(910)) - Stopped o.e.j.s.ServletContextHandler@7dfd3c81{/logs,file:///var/log/hadoop-yarn/yarn/,UNAVAILABLE} 2018-12-06 13:04:24,142 INFO storage.HBaseTimelineReaderImpl (HBaseTimelineReaderImpl.java:serviceStop(108)) - closing the hbase Connection 2018-12-06 13:04:24,143 INFO zookeeper.ReadOnlyZKClient (ReadOnlyZKClient.java:close(342)) - Close zookeeper connection 0x4d465b11 to examples.foodscience-01.de:2181,examples.foodscience-02.de:2181,examples.foodscience-03.de:2181 2018-12-06 13:04:24,143 WARN storage.HBaseTimelineReaderImpl (HBaseTimelineReaderImpl.java:run(183)) - Got failure attempting to read from timeline storage, assuming HBase down java.io.UncheckedIOException: java.io.InterruptedIOException at org.apache.hadoop.hbase.client.ResultScanner$1.hasNext(ResultScanner.java:55) at org.apache.hadoop.yarn.server.timelineservice.storage.reader.TimelineEntityReader.readEntities(TimelineEntityReader.java:283) at org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl$HBaseMonitor.run(HBaseTimelineReaderImpl.java:174) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.InterruptedIOException at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:246) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:269) at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437) at org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312) at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597) at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:834) at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732) at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:325) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:269) at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437) at org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312) at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597) at org.apache.hadoop.hbase.client.ResultScanner$1.hasNext(ResultScanner.java:53) ... 9 more 2018-12-06 13:04:24,153 INFO zookeeper.ReadOnlyZKClient (ReadOnlyZKClient.java:close(342)) - Close zookeeper connection 0x5b7a5baa to examples.foodscience-01.de:2181,examples.foodscience-02.de:2181,examples.foodscience-03.de:2181 2018-12-06 13:04:24,155 INFO reader.TimelineReaderServer (LogAdapter.java:info(51)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down TimelineReaderServer at examples.foodscience-01.de/163.49.39.115
I dont know why this error appears when starting the timeline service. How can this be fixed?
Created 12-12-2018 12:14 PM
Can you share your current architecture single /multi-node? Is your hbase master and regions server up and running? If multi-node havd you installed YARN client of the region server? If not please install the YARN client and retry.
Created 12-12-2018 01:42 PM
It s a multi node cluster (4 data nodes). I didn't install HBase on Ambari.
I installed YARN service with Ambari and the Timeline Service worked until few days ago where this error emerged.
How can I check if YARN client of region server is installed?
Created 12-12-2018 06:26 PM
You will need to have HBase installed because the error snippet
"details=row 'prod.timelineservice.entity' on table 'hbase:meta' at region=hbase:meta,"
YARN Timeline Service v.2 uses a set of collectors (writers) to write data to the backend storage it uses Apache HBase as the primary backing storage, as Apache HBase scales well to a large size while maintaining good response times for reads and writes. The collectors are distributed and co-located with the application masters to which they are dedicated.
On the nodes running the region servers, you should be able to see the software installed on that node.
HTH
Created 12-13-2018 10:26 PM
Maybe we can discuss this in more detail in a skype session?
Do you have contact data?
We are facing same issue, we don't want to install hbase as our VM's as we don't have capacity to handle HDFS and Hbase on same cluster. Do we have any work around for this
thanks in advance for the suggestion
Ram