Member since
01-14-2016
49
Posts
4
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5406 | 10-21-2019 09:30 AM | |
2158 | 10-11-2018 09:23 PM | |
2861 | 09-13-2018 06:39 PM | |
3446 | 04-27-2018 02:51 PM | |
1272 | 06-01-2017 12:35 AM |
10-28-2019
02:39 PM
Hello Shelton. I've followed entire given steps and service still is not coming up. Below attached outputs. From: /var/lib/ambari-agent/data/errors-31848.txt resource_management.core.exceptions.ExecuteTimeoutException: Execution of 'ambari-sudo.sh su yarnats -l -s /bin/bash -c 'export PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/puppetlabs/bin:/var/lib/ambari-agent:/var/lib/ambari-agent'"'"' ; sleep 10;export HBASE_CLASSPATH_PREFIX=/usr/hdp/3.1.0.0-78/hadoop-yarn/timelineservice/*; /usr/hdp/3.1.0.0-78/hbase/bin/hbase --config /usr/hdp/3.1.0.0-78/hadoop/conf/embedded-yarn-ats-hbase org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator -Dhbase.client.retries.number=35 -create -s'' was killed due timeout after 300 seconds From: /var/lib/ambari-agent/data/output-31848.txt 2019-10-22 16:41:17,992 WARN [main-EventThread] coordination.ZKSplitLogManagerCoordination$CreateRescanAsyncCallback: rc=NONODE for /atsv2-hbase-unsecure/splitWAL/RESCAN remaining retries=9223372036854744889
2019-10-22 16:41:17,992 WARN [main-EventThread] coordination.ZKSplitLogManagerCoordination$CreateRescanAsyncCallback: rc=NONODE for /atsv2-hbase-unsecure/splitWAL/RESCAN remaining retries=9223372036854735924
2019-10-22 16:41:17,992 WARN [main-EventThread] coordination.ZKSplitLogManagerCoordination$CreateRescanAsyncCallback: rc=NONODE for /atsv2-hbase-unsecure/splitWAL/RESCAN remaining retries=9223372036854772106
2019-10-22 16:41:17,992 WARN [main-EventThread] coordination.ZKSplitLogManagerCoordination$CreateRescanAsyncCallback: rc=NONODE for /atsv2-hbase-unsecure/splitWAL/RESCAN remaining retries=9223372036854768736
2019-10-22 16:41:17,992 WARN [main-EventThread] coordination.ZKSplitLogManagerCoordination$CreateRescanAsyncCallback: rc=NONODE for /atsv2-hbase-unsecure/splitWAL/RESCAN remaining retries=9223372036854749025
==> /usr/logs/hadoop-yarn/embedded-yarn-ats-hbase/gc.log-201910110639 <==
Java HotSpot(TM) 64-Bit Server VM (25.60-b23) for linux-amd64 JRE (1.8.0_60-b27), built on Aug 4 2015 12:19:40 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 131732324k(11848752k free), swap 8388604k(8279292k free)
CommandLine flags: -XX:ErrorFile=/usr/logs/hadoop-yarn/embedded-yarn-ats-hbase/hs_err_pid%p.log -XX:InitialHeapSize=2107717184 -XX:MaxHeapSize=3435134976 -XX:MaxNewSize=1145044992 -XX:MaxTenuringThreshold=6 -XX:OldPLABSize=16 -XX:OnOutOfMemoryError=kill -9 %p -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
Heap
par new generation total 618048K, used 197776K [0x00000006f3400000, 0x000000071d2a0000, 0x0000000737800000)
eden space 549376K, 36% used [0x00000006f3400000, 0x00000006ff5243a8, 0x0000000714c80000)
from space 68672K, 0% used [0x0000000714c80000, 0x0000000714c80000, 0x0000000718f90000)
to space 68672K, 0% used [0x0000000718f90000, 0x0000000718f90000, 0x000000071d2a0000)
concurrent mark-sweep generation total 1373568K, used 0K [0x0000000737800000, 0x000000078b560000, 0x00000007c0000000)
Metaspace used 11629K, capacity 11810K, committed 11904K, reserved 1060864K
class space used 1251K, capacity 1316K, committed 1408K, reserved 1048576K
==> /usr/logs/hadoop-yarn/embedded-yarn-ats-hbase/gc.log-201910100851 <==
Java HotSpot(TM) 64-Bit Server VM (25.60-b23) for linux-amd64 JRE (1.8.0_60-b27), built on Aug 4 2015 12:19:40 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 131732324k(1591264k free), swap 8388604k(8280060k free)
CommandLine flags: -XX:CMSInitiatingOccupancyFraction=70 -XX:ErrorFile=/usr/logs/hadoop-yarn/embedded-yarn-ats-hbase/hs_err_pid%p.log -XX:InitialHeapSize=3435134976 -XX:MaxHeapSize=3435134976 -XX:MaxNewSize=1145044992 -XX:MaxTenuringThreshold=6 -XX:NewSize=1145044992 -XX:OldPLABSize=16 -XX:OldSize=2290089984 -XX:OnOutOfMemoryError=kill -9 %p -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:ReservedCodeCacheSize=268435456 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
2019-10-10T08:51:22.325-0700: 2.213: [GC (CMS Initial Mark) [1 CMS-initial-mark: 0K(2236416K)] 715687K(3242816K), 0.1832180 secs] [Times: user=0.48 sys=0.07, real=0.19 secs]
2019-10-10T08:51:22.508-0700: 2.396: [CMS-concurrent-mark-start]
2019-10-10T08:51:22.509-0700: 2.397: [CMS-concurrent-mark: 0.001/0.001 secs] [Times: user=0.01 sys=0.01, real=0.00 secs]
2019-10-10T08:51:22.509-0700: 2.397: [CMS-concurrent-preclean-start]
2019-10-10T08:51:22.513-0700: 2.400: [CMS-concurrent-preclean: 0.003/0.003 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
2019-10-10T08:51:22.513-0700: 2.400: [CMS-concurrent-abortable-preclean-start]
2019-10-10T08:51:22.827-0700: 2.715: [GC (Allocation Failure) 2019-10-10T08:51:22.827-0700: 2.715: [ParNew: 894592K->37233K(1006400K), 0.0334809 secs] 894592K->37233K(3242816K), 0.0335760 secs] [Times: user=0.17 sys=0.03, real=0.03 secs]
Heap
par new generation total 1006400K, used 577717K [0x00000006f3400000, 0x0000000737800000, 0x0000000737800000)
eden space 894592K, 60% used [0x00000006f3400000, 0x00000007143d0fc8, 0x0000000729da0000)
from space 111808K, 33% used [0x0000000730ad0000, 0x0000000732f2c758, 0x0000000737800000)
to space 111808K, 0% used [0x0000000729da0000, 0x0000000729da0000, 0x0000000730ad0000)
concurrent mark-sweep generation total 2236416K, used 0K [0x0000000737800000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace used 52260K, capacity 52701K, committed 53168K, reserved 1095680K
class space used 5905K, capacity 6041K, committed 6096K, reserved 1048576K
2019-10-10T08:51:24.359-0700: 4.247: [CMS-concurrent-abortable-preclean: 1.100/1.847 secs] [Times: user=4.75 sys=0.27, real=1.85 secs]
Command failed after 1 tries From: yarn-timelineserver-gc.log Total 180427 20331968
, 0.0109813 secs]
25.966: [GC (Allocation Failure) [PSYoungGen: 766976K->26024K(1047040K)] 786831K->45895K(2136576K), 0.0205774 secs] [Times: user=0.18 sys=0.02, real=0.02 secs]
27.877: [GC (Allocation Failure) [PSYoungGen: 987560K->37814K(1176576K)] 1007431K->57702K(2266112K), 0.0452135 secs] [Times: user=0.31 sys=0.03, real=0.05 secs]
29.872: [GC (Allocation Failure) [PSYoungGen: 1128886K->40013K(1176576K)] 1148774K->59908K(2266112K), 0.0376384 secs] [Times: user=0.25 sys=0.02, real=0.04 secs]
31.621: [GC (Allocation Failure) [PSYoungGen: 1131085K->41607K(1708032K)] 1150980K->61510K(2797568K), 0.0426743 secs] [Times: user=0.19 sys=0.02, real=0.04 secs]
34.381: [GC (Allocation Failure) [PSYoungGen: 1702023K->52721K(1713152K)] 1721926K->75113K(2802688K), 0.0671733 secs] [Times: user=0.32 sys=0.06, real=0.07 secs]
544.663: [GC (Allocation Failure) [PSYoungGen: 1713137K->24633K(2550784K)] 1735529K->55561K(3640320K), 0.0502315 secs] [Times: user=0.37 sys=0.08, real=0.05 secs]
1744.725: [GC (Allocation Failure) [PSYoungGen: 2550329K->6583K(2657792K)] 2581257K->37803K(3747328K), 0.0109360 secs] [Times: user=0.06 sys=0.05, real=0.01 secs]
3364.582: [GC (Allocation Failure) [PSYoungGen: 2603959K->7333K(2513408K)] 2635179K->38561K(3602944K), 0.0106033 secs] [Times: user=0.05 sys=0.05, real=0.01 secs]
4564.508: [GC (Allocation Failure) [PSYoungGen: 2513061K->7397K(2425856K)] 2544289K->38633K(3515392K), 0.0098975 secs] [Times: user=0.05 sys=0.05, real=0.01 secs]
5944.468: [GC (Allocation Failure) [PSYoungGen: 2425573K->7432K(2342400K)] 2456809K->38676K(3431936K), 0.0100541 secs] [Times: user=0.05 sys=0.04, real=0.01 secs]
6904.427: [GC (Allocation Failure) [PSYoungGen: 2342152K->7814K(2263040K)] 2373396K->39065K(3352576K), 0.0100246 secs] [Times: user=0.06 sys=0.05, real=0.01 secs]
7624.583: [GC (Allocation Failure) [PSYoungGen: 2262662K->7335K(2186240K)] 2293913K->38595K(3275776K), 0.0126832 secs] [Times: user=0.07 sys=0.03, real=0.01 secs]
8524.740: [GC (Allocation Failure) [PSYoungGen: 2185895K->7238K(2113536K)] 2217155K->38505K(3203072K), 0.0110849 secs] [Times: user=0.06 sys=0.05, real=0.01 secs]
9604.461: [GC (Allocation Failure) [PSYoungGen: 2113094K->7415K(2043904K)] 2144361K->38690K(3133440K), 0.0187939 secs] [Times: user=0.11 sys=0.07, real=0.02 secs]
10864.545: [GC (Allocation Failure) [PSYoungGen: 2043639K->7287K(1977344K)] 2074914K->38570K(3066880K), 0.0131232 secs] [Times: user=0.12 sys=0.04, real=0.01 secs] From: yarn-timelineserver-gc.log 2019-10-28 14:35:06,753 WARN timeline.EntityGroupFSTimelineStore (LogInfo.java:doParse(208)) - Error putting entity: dag_1572286585508_0006_331 (TEZ_DAG_ID): 6
2019-10-28 14:35:06,753 WARN timeline.EntityGroupFSTimelineStore (LogInfo.java:doParse(208)) - Error putting entity: dag_1572286585508_0006_332 (TEZ_DAG_ID): 6
2019-10-28 14:35:06,754 WARN timeline.EntityGroupFSTimelineStore (LogInfo.java:doParse(208)) - Error putting entity: dag_1572286585508_0006_332 (TEZ_DAG_ID): 6
2019-10-28 14:35:06,754 WARN timeline.EntityGroupFSTimelineStore (LogInfo.java:doParse(208)) - Error putting entity: dag_1572286585508_0006_333 (TEZ_DAG_ID): 6
2019-10-28 14:35:06,754 WARN timeline.EntityGroupFSTimelineStore (LogInfo.java:doParse(208)) - Error putting entity: dag_1572286585508_0006_333 (TEZ_DAG_ID): 6
2019-10-28 14:35:06,755 WARN timeline.EntityGroupFSTimelineStore (LogInfo.java:doParse(208)) - Error putting entity: dag_1572286585508_0006_334 (TEZ_DAG_ID): 6
2019-10-28 14:35:06,755 WARN timeline.EntityGroupFSTimelineStore (LogInfo.java:doParse(208)) - Error putting entity: dag_1572286585508_0006_334 (TEZ_DAG_ID): 6
2019-10-28 14:35:06,755 INFO timeline.LogInfo (LogInfo.java:parseForStore(116)) - Parsed 1338 entities from hdfs://hdpnndev/ats/active/application_1572286585508_0006/appattempt_1572286585508_0006_000001/summarylog-appattempt_1572286585508_0006_000001 in 314 msec From: hadoop-yarn-resourcemanager-server.log TARGET=ClientRMService RESULT=SUCCESS
2019-10-28 14:36:08,043 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1572286585508_0005_000001 container=null queue=batchq1 clusterResource=<memory:411648, vCores:128> type=RACK_LOCAL requestedPartition=
2019-10-28 14:36:08,043 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e247_1572286585508_0005_01_000398 Container Transitioned from NEW to ALLOCATED
2019-10-28 14:36:08,043 INFO fica.FiCaSchedulerNode (FiCaSchedulerNode.java:allocateContainer(169)) - Assigned container container_e247_1572286585508_0005_01_000398 of capacity <memory:3072, vCores:1> on host server:45454, which has 6 containers, <memory:70656, vCores:6> used and <memory:32256, vCores:26> available after allocation
2019-10-28 14:36:08,043 INFO resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(200)) - USER=hive OPERATION=AM Allocated Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1572286585508_0005 CONTAINERID=container_e247_1572286585508_0005_01_000398 RESOURCE=<memory:3072, vCores:1>
2019-10-28 14:36:08,043 INFO capacity.ParentQueue (ParentQueue.java:apply(1336)) - assignedContainer queue=batch usedCapacity=0.14925392 absoluteUsedCapacity=0.11940298 used=<memory:49152, vCores:13> cluster=<memory:411648, vCores:128>
2019-10-28 14:36:08,043 INFO capacity.ParentQueue (ParentQueue.java:apply(1336)) - assignedContainer queue=root usedCapacity=0.40298507 absoluteUsedCapacity=0.40298507 used=<memory:165888, vCores:17> cluster=<memory:411648, vCores:128>
2019-10-28 14:36:08,043 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2900)) - Allocation proposal accepted
2019-10-28 14:36:08,103 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e247_1572286585508_0005_01_000398 Container Transitioned from ALLOCATED to ACQUIRED
2019-10-28 14:36:08,300 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1572286585508_0006_000001 container=null queue=batchq1 clusterResource=<memory:411648, vCores:128> type=OFF_SWITCH requestedPartition=
2019-10-28 14:36:08,300 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e247_1572286585508_0006_01_000647 Container Transitioned from NEW to ALLOCATED
2019-10-28 14:36:08,300 INFO fica.FiCaSchedulerNode (FiCaSchedulerNode.java:allocateContainer(169)) - Assigned container container_e247_1572286585508_0006_01_000647 of capacity <memory:3072, vCores:1> on host server:45454, which has 5 containers, <memory:18432, vCores:5> used and <memory:84480, vCores:27> available after allocation
2019-10-28 14:36:08,300 INFO resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(200)) - USER=hive OPERATION=AM Allocated Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1572286585508_0006 CONTAINERID=container_e247_1572286585508_0006_01_000647 RESOURCE=<memory:3072, vCores:1>
2019-10-28 14:36:08,300 INFO capacity.ParentQueue (ParentQueue.java:apply(1336)) - assignedContainer queue=batch usedCapacity=0.15858229 absoluteUsedCapacity=0.12686567 used=<memory:52224, vCores:14> cluster=<memory:411648, vCores:128>
2019-10-28 14:36:08,300 INFO capacity.ParentQueue (ParentQueue.java:apply(1336)) - assignedContainer queue=root usedCapacity=0.41044775 absoluteUsedCapacity=0.41044775 used=<memory:168960, vCores:18> cluster=<memory:411648, vCores:128>
2019-10-28 14:36:08,300 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2900)) - Allocation proposal accepted
2019-10-28 14:36:08,354 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e247_1572286585508_0005_01_000398 Container Transitioned from ACQUIRED to RELEASED
2019-10-28 14:36:08,354 INFO resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(200)) - USER=hive IP=10.10.81.14 OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1572286585508_0005 CONTAINERID=container_e247_1572286585508_0005_01_000398 RESOURCE=<memory:3072, vCores:1>
2019-10-28 14:36:08,354 INFO scheduler.AppSchedulingInfo (AppSchedulingInfo.java:updatePendingResources(367)) - checking for deactivate of application :application_1572286585508_0005
2019-10-28 14:36:08,485 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e247_1572286585508_0006_01_000647 Container Transitioned from ALLOCATED to ACQUIRED
2019-10-28 14:36:08,736 INFO scheduler.AppSchedulingInfo (AppSchedulingInfo.java:updatePendingResources(367)) - checking for deactivate of application :application_1572286585508_0006
2019-10-28 14:36:08,987 INFO scheduler.AppSchedulingInfo (AppSchedulingInfo.java:updatePendingResources(367)) - checking for deactivate of application :application_1572286585508_0006 This are not the complete logs, just a glimpse. I hope it helps to come up with any idea. It gives me the impression it's heap memory issue. But... AppTimelineServer Java heap size = 8G , therefore any thought is appreciated. Regards!
... View more
10-25-2019
07:45 AM
Ambari 2.7.3 HDP 3.1.0 Hello team. My service went down 15 days ago and I'm still not able to fix this issue. If someone has face it before, please let me know. From last logs: Caused by: java.net.ConnectException: Call to atsserver/ip:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: atsserver/ip:17020 at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:165) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406) at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103) at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118) at org.apache.hadoop.hbase.ipc.BufferCallBeforeInitHandler.userEventTriggered(BufferCallBeforeInitHandler.java:92) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:307) at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.userEventTriggered(DefaultChannelPipeline.java:1377) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315) at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireUserEventTriggered(DefaultChannelPipeline.java:929) at org.apache.hadoop.hbase.ipc.NettyRpcConnection.failInit(NettyRpcConnection.java:179) at org.apache.hadoop.hbase.ipc.NettyRpcConnection.access$500(NettyRpcConnection.java:71) at org.apache.hadoop.hbase.ipc.NettyRpcConnection$3.operationComplete(NettyRpcConnection.java:269) at org.apache.hadoop.hbase.ipc.NettyRpcConnection$3.operationComplete(NettyRpcConnection.java:263) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122) at org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327) at org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343) at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) ... 1 more Caused by: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: atserver/ip:17020 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hbase.thirdparty.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) at org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ... 7 more Caused by: java.net.ConnectException: Connection refused ... 11 more 2019-10-10 10:03:05,448 ERROR reader.TimelineReaderServer (LogAdapter.java:error(75)) - RECEIVED SIGNAL 15: SIGTERM 2019-10-10 10:03:05,457 INFO handler.ContextHandler (ContextHandler.java:doStop(910)) - Stopped o.e.j.w.WebAppContext@7357a011{/,null,UNAVAILABLE}{/timeline} 2019-10-10 10:03:05,460 INFO server.AbstractConnector (AbstractConnector.java:doStop(318)) - Stopped ServerConnector@1bae316d{HTTP/1.1,[http/1.1]}{0.0.0.0:8198} 2019-10-10 10:03:05,460 INFO handler.ContextHandler (ContextHandler.java:doStop(910)) - Stopped o.e.j.s.ServletContextHandler@4bff7da0{/static,jar:file:/usr/hdp/3.1.0.0-78/hadoop-yarn/hadoop-yarn-common-3.1.1.3.1.0.0-78.jar!/webapps/static,UNAVAILABLE} 2019-10-10 10:03:05,460 INFO handler.ContextHandler (ContextHandler.java:doStop(910)) - Stopped o.e.j.s.ServletContextHandler@3e62d773{/logs,file:///usr/logs/hadoop-yarn/yarn/,UNAVAILABLE} 2019-10-10 10:03:05,462 INFO storage.HBaseTimelineReaderImpl (HBaseTimelineReaderImpl.java:serviceStop(108)) - closing the hbase Connection 2019-10-10 10:03:05,462 INFO zookeeper.ReadOnlyZKClient (ReadOnlyZKClient.java:close(342)) - Close zookeeper connection 0x783a467b to zookeeper:2181,zookeeper:2181,zookeeper:2181,zookeeper:2181,zookeeper:2181 2019-10-10 10:03:05,465 INFO zookeeper.ReadOnlyZKClient (ReadOnlyZKClient.java:close(342)) - Close zookeeper connection 0x0ba2f4ec to zookeeper quorum 2019-10-10 10:03:05,467 INFO zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x26daa4f60630022 closed 2019-10-10 10:03:05,467 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(524)) - EventThread shut down 2019-10-10 10:03:05,468 INFO reader.TimelineReaderServer (LogAdapter.java:info(51)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down TimelineReaderServer at atsserver/ip ************************************************************/ I've followed the steps on this workaround and I am getting a new error message https://community.cloudera.com/t5/Support-Questions/ATS-hbase-does-not-seem-to-start/td-p/235155 - guillaume_roger We use to hace kerberized this cluster but it's not anymore a a few months ago. That's why this message is weird 2019-10-25 08:58:31,650 - HdfsResource['/atsv2/hbase/data'] {'security_enabled': False, 'hadoop_bin_dir': '/usr/hdp/3.1.0.0-78/hadoop/bin', 'keytab': '', 'dfs_type': 'HDFS', 'default_fs': 'hdfs://hdpnndev', 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': '/usr/bin/kinit', 'principal_name': '', 'user': 'hdfs', 'owner': 'yarnats', 'hadoop_conf_dir': '/usr/hdp/3.1.0.0-78/hadoop/conf', 'type': 'directory', 'action': ['create_on_execute'], 'immutable_paths': [u'/mr-history/done', u'/warehouse/tablespace/managed/hive', u'/warehouse/tablespace/external/hive', u'/app-logs', u'/tmp']}
2019-10-25 08:58:31,652 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://masternode:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpZs_d1F 2>/tmp/tmpNf5VJV''] {'quiet': False} Then , it got stuck several minutes on this part 2019-10-25 08:58:31,791 - call returned (0, '')
2019-10-25 08:58:31,792 - get_user_call_output returned (0, u'{\n "beans" : [ {\n "name" : "Hadoop:service=NameNode,name=FSNamesystem",\n "modelerType" : "FSNamesystem",\n "tag.Context" : "dfs",\n "tag.HAState" : "active",\n "tag.TotalSyncTimes" : "103 16 ",\n "tag.Hostname" : "zookeeper",\n "MissingBlocks" : 0,\n "MissingReplOneBlocks" : 0,\n "ExpiredHeartbeats" : 0,\n "TransactionsSinceLastCheckpoint" : 162564,\n "TransactionsSinceLastLogRoll" : 97,\n "LastWrittenTransactionId" : 136802255,\n "LastCheckpointTime" : 1571986118000,\n "CapacityTotal" : 94452535787520,\n "CapacityTotalGB" : 87966.0,\n "CapacityUsed" : 17431687303220,\n "CapacityUsedGB" : 16235.0,\n "CapacityRemaining" : 72211133116924,\n "ProvidedCapacityTotal" : 0,\n "CapacityRemainingGB" : 67252.0,\n "CapacityUsedNonDFS" : 0,\n "TotalLoad" : 128,\n "SnapshottableDirectories" : 1,\n "Snapshots" : 1,\n "NumEncryptionZones" : 0,\n "LockQueueLength" : 0,\n "BlocksTotal" : 985264,\n "NumFilesUnderConstruction" : 32,\n "NumActiveClients" : 21,\n "FilesTotal" : 1560325,\n "PendingReplicationBlocks" : 0,\n "PendingReconstructionBlocks" : 0,\n "UnderReplicatedBlocks" : 20,\n "LowRedundancyBlocks" : 20,\n "CorruptBlocks" : 0,\n "ScheduledReplicationBlocks" : 0,\n "PendingDeletionBlocks" : 597,\n "LowRedundancyReplicatedBlocks" : 20,\n "CorruptReplicatedBlocks" : 0,\n "MissingReplicatedBlocks" : 0,\n "MissingReplicationOneBlocks" : 0,\n "HighestPriorityLowRedundancyReplicatedBlocks" : 0,\n "HighestPriorityLowRedundancyECBlocks" : 0,\n "BytesInFutureReplicatedBlocks" : 0,\n "PendingDeletionReplicatedBlocks" : 597,\n "TotalReplicatedBlocks" : 985264,\n "LowRedundancyECBlockGroups" : 0,\n "CorruptECBlockGroups" : 0,\n "MissingECBlockGroups" : 0,\n "BytesInFutureECBlockGroups" : 0,\n "PendingDeletionECBlocks" : 0,\n "TotalECBlockGroups" : 0,\n "ExcessBlocks" : 0,\n "NumTimedOutPendingReconstructions" : 0,\n "PostponedMisreplicatedBlocks" : 24,\n "PendingDataNodeMessageCount" : 0,\n "MillisSinceLastLoadedEdits" : 0,\n "BlockCapacity" : 2097152,\n "NumLiveDataNodes" : 4,\n "NumDeadDataNodes" : 0,\n "NumDecomLiveDataNodes" : 0,\n "NumDecomDeadDataNodes" : 0,\n "VolumeFailuresTotal" : 0,\n "EstimatedCapacityLostTotal" : 0,\n "NumDecommissioningDataNodes" : 0,\n "StaleDataNodes" : 0,\n "NumStaleStorages" : 48,\n "TotalSyncCount" : 76,\n "NumInMaintenanceLiveDataNodes" : 0,\n "NumInMaintenanceDeadDataNodes" : 0,\n "NumEnteringMaintenanceDataNodes" : 0\n } ]\n}', u'')
2019-10-25 08:58:31,793 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://yarnatsserver:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpmLa9e3 2>/tmp/tmpcGNRPR''] {'quiet': False} Finally it finishes with this message 2019-10-25 09:17:53,022 - call returned (143, "-bash: line 1: 28392 Terminated curl -s 'http://yarnatsserver:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem' > /tmp/tmpmLa9e3 2> /tmp/tmpcGNRPR")
2019-10-25 09:17:53,022 - call['hdfs haadmin -ns hdpnndev -getServiceState nn2'] {'logoutput': True, 'user': 'hdfs'}
... View more
Labels:
10-21-2019
09:30 AM
Hello dnavarro. Thank you for asking... Actually my question was because I wasn't getting complete size of hdfs that I have on the output command hdfs -du against the size I was watching on the ambari UI. There was a significant difference on it. At the end, it turned up that that difference was due to that hdp 2.6.5 version command hdfs dfs -du is not taking in count the snapshots sizes, therefore, there we have the missmatching I was looking since the beginning and why I needed that output. Just keep in mind that for version grater than 2.6.5 you need to take snapshots size separately if you want to check entire hdfs size 😉 😉
... View more
09-13-2019
01:55 PM
I am tyring to get this output. Nevertheless some articles points out this is due to a version matter, which I doubt, since I've seen this behavior on multiple environments with different versions is just that I don't know how to enable it.
If someone can point me to the right path I would highly appreciate it.
Regards!
... View more
Labels:
- Labels:
-
Apache HBase
08-07-2019
03:54 PM
1 Kudo
Is there any sort of formula or how did you came up with this value for users's processes? is it a random value? what can I check within my cluster in order to get a proper value for me?
... View more
03-29-2019
10:25 PM
Any one who can help us to get an official site to get his jar to make it work with tibco? Regards!
... View more
Labels:
- Labels:
-
Apache Hadoop
10-11-2018
09:23 PM
This is a known behavior from Hortonworks, check the discussion https://issues.apache.org/jira/browse/HDFS-8986 Regards!
... View more
10-11-2018
03:34 PM
Hello team.
Current lecture directly to the involved directory is giving us: hadoop fs -du -h -s </path/dir> t
1.4 T </path/dir> of information but this is not accurate because when getting each and one size from the files and directories within above directory, performing a simple addition operation from excel we are getting
386574657917
which translated in GB is 360GB which is the expected size within this directory. Has someone face this before? Regards!
... View more
Labels:
- Labels:
-
Apache Hadoop
09-13-2018
09:17 PM
Also check https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html https://community.hortonworks.com/questions/5780/hive-on-tez-query-map-output-outofmemoryerror-java.html https://community.hortonworks.com/questions/12067/what-is-the-workaround-when-getting-hive-outofmemo.html
... View more
09-13-2018
08:58 PM
What I understand from documentation (I am applying same changes) and from previous experience is that happens that some jobs has too many inputs and outputs to be handle by the system itself (mappers, containers, reducers etc...) this affects from one side your cluster performance since resources are not enough for such jobs to complete or simply they start to present a considerable delay to what is expected. I've seen that's the moment when OOM error shows up. Saying this, I guess explanation from the smartsense is clear since it tells us "Enabling auto-scaling scales down requests, which may otherwise be too large." Hope this helps 😉
... View more