Member since
01-19-2017
3679
Posts
632
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 754 | 06-04-2025 11:36 PM | |
| 1334 | 03-23-2025 05:23 AM | |
| 660 | 03-17-2025 10:18 AM | |
| 2393 | 03-05-2025 01:34 PM | |
| 1563 | 03-03-2025 01:09 PM |
10-28-2019
02:39 PM
Hello Shelton. I've followed entire given steps and service still is not coming up. Below attached outputs. From: /var/lib/ambari-agent/data/errors-31848.txt resource_management.core.exceptions.ExecuteTimeoutException: Execution of 'ambari-sudo.sh su yarnats -l -s /bin/bash -c 'export PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/puppetlabs/bin:/var/lib/ambari-agent:/var/lib/ambari-agent'"'"' ; sleep 10;export HBASE_CLASSPATH_PREFIX=/usr/hdp/3.1.0.0-78/hadoop-yarn/timelineservice/*; /usr/hdp/3.1.0.0-78/hbase/bin/hbase --config /usr/hdp/3.1.0.0-78/hadoop/conf/embedded-yarn-ats-hbase org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator -Dhbase.client.retries.number=35 -create -s'' was killed due timeout after 300 seconds From: /var/lib/ambari-agent/data/output-31848.txt 2019-10-22 16:41:17,992 WARN [main-EventThread] coordination.ZKSplitLogManagerCoordination$CreateRescanAsyncCallback: rc=NONODE for /atsv2-hbase-unsecure/splitWAL/RESCAN remaining retries=9223372036854744889
2019-10-22 16:41:17,992 WARN [main-EventThread] coordination.ZKSplitLogManagerCoordination$CreateRescanAsyncCallback: rc=NONODE for /atsv2-hbase-unsecure/splitWAL/RESCAN remaining retries=9223372036854735924
2019-10-22 16:41:17,992 WARN [main-EventThread] coordination.ZKSplitLogManagerCoordination$CreateRescanAsyncCallback: rc=NONODE for /atsv2-hbase-unsecure/splitWAL/RESCAN remaining retries=9223372036854772106
2019-10-22 16:41:17,992 WARN [main-EventThread] coordination.ZKSplitLogManagerCoordination$CreateRescanAsyncCallback: rc=NONODE for /atsv2-hbase-unsecure/splitWAL/RESCAN remaining retries=9223372036854768736
2019-10-22 16:41:17,992 WARN [main-EventThread] coordination.ZKSplitLogManagerCoordination$CreateRescanAsyncCallback: rc=NONODE for /atsv2-hbase-unsecure/splitWAL/RESCAN remaining retries=9223372036854749025
==> /usr/logs/hadoop-yarn/embedded-yarn-ats-hbase/gc.log-201910110639 <==
Java HotSpot(TM) 64-Bit Server VM (25.60-b23) for linux-amd64 JRE (1.8.0_60-b27), built on Aug 4 2015 12:19:40 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 131732324k(11848752k free), swap 8388604k(8279292k free)
CommandLine flags: -XX:ErrorFile=/usr/logs/hadoop-yarn/embedded-yarn-ats-hbase/hs_err_pid%p.log -XX:InitialHeapSize=2107717184 -XX:MaxHeapSize=3435134976 -XX:MaxNewSize=1145044992 -XX:MaxTenuringThreshold=6 -XX:OldPLABSize=16 -XX:OnOutOfMemoryError=kill -9 %p -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
Heap
par new generation total 618048K, used 197776K [0x00000006f3400000, 0x000000071d2a0000, 0x0000000737800000)
eden space 549376K, 36% used [0x00000006f3400000, 0x00000006ff5243a8, 0x0000000714c80000)
from space 68672K, 0% used [0x0000000714c80000, 0x0000000714c80000, 0x0000000718f90000)
to space 68672K, 0% used [0x0000000718f90000, 0x0000000718f90000, 0x000000071d2a0000)
concurrent mark-sweep generation total 1373568K, used 0K [0x0000000737800000, 0x000000078b560000, 0x00000007c0000000)
Metaspace used 11629K, capacity 11810K, committed 11904K, reserved 1060864K
class space used 1251K, capacity 1316K, committed 1408K, reserved 1048576K
==> /usr/logs/hadoop-yarn/embedded-yarn-ats-hbase/gc.log-201910100851 <==
Java HotSpot(TM) 64-Bit Server VM (25.60-b23) for linux-amd64 JRE (1.8.0_60-b27), built on Aug 4 2015 12:19:40 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 131732324k(1591264k free), swap 8388604k(8280060k free)
CommandLine flags: -XX:CMSInitiatingOccupancyFraction=70 -XX:ErrorFile=/usr/logs/hadoop-yarn/embedded-yarn-ats-hbase/hs_err_pid%p.log -XX:InitialHeapSize=3435134976 -XX:MaxHeapSize=3435134976 -XX:MaxNewSize=1145044992 -XX:MaxTenuringThreshold=6 -XX:NewSize=1145044992 -XX:OldPLABSize=16 -XX:OldSize=2290089984 -XX:OnOutOfMemoryError=kill -9 %p -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:ReservedCodeCacheSize=268435456 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
2019-10-10T08:51:22.325-0700: 2.213: [GC (CMS Initial Mark) [1 CMS-initial-mark: 0K(2236416K)] 715687K(3242816K), 0.1832180 secs] [Times: user=0.48 sys=0.07, real=0.19 secs]
2019-10-10T08:51:22.508-0700: 2.396: [CMS-concurrent-mark-start]
2019-10-10T08:51:22.509-0700: 2.397: [CMS-concurrent-mark: 0.001/0.001 secs] [Times: user=0.01 sys=0.01, real=0.00 secs]
2019-10-10T08:51:22.509-0700: 2.397: [CMS-concurrent-preclean-start]
2019-10-10T08:51:22.513-0700: 2.400: [CMS-concurrent-preclean: 0.003/0.003 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
2019-10-10T08:51:22.513-0700: 2.400: [CMS-concurrent-abortable-preclean-start]
2019-10-10T08:51:22.827-0700: 2.715: [GC (Allocation Failure) 2019-10-10T08:51:22.827-0700: 2.715: [ParNew: 894592K->37233K(1006400K), 0.0334809 secs] 894592K->37233K(3242816K), 0.0335760 secs] [Times: user=0.17 sys=0.03, real=0.03 secs]
Heap
par new generation total 1006400K, used 577717K [0x00000006f3400000, 0x0000000737800000, 0x0000000737800000)
eden space 894592K, 60% used [0x00000006f3400000, 0x00000007143d0fc8, 0x0000000729da0000)
from space 111808K, 33% used [0x0000000730ad0000, 0x0000000732f2c758, 0x0000000737800000)
to space 111808K, 0% used [0x0000000729da0000, 0x0000000729da0000, 0x0000000730ad0000)
concurrent mark-sweep generation total 2236416K, used 0K [0x0000000737800000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace used 52260K, capacity 52701K, committed 53168K, reserved 1095680K
class space used 5905K, capacity 6041K, committed 6096K, reserved 1048576K
2019-10-10T08:51:24.359-0700: 4.247: [CMS-concurrent-abortable-preclean: 1.100/1.847 secs] [Times: user=4.75 sys=0.27, real=1.85 secs]
Command failed after 1 tries From: yarn-timelineserver-gc.log Total 180427 20331968
, 0.0109813 secs]
25.966: [GC (Allocation Failure) [PSYoungGen: 766976K->26024K(1047040K)] 786831K->45895K(2136576K), 0.0205774 secs] [Times: user=0.18 sys=0.02, real=0.02 secs]
27.877: [GC (Allocation Failure) [PSYoungGen: 987560K->37814K(1176576K)] 1007431K->57702K(2266112K), 0.0452135 secs] [Times: user=0.31 sys=0.03, real=0.05 secs]
29.872: [GC (Allocation Failure) [PSYoungGen: 1128886K->40013K(1176576K)] 1148774K->59908K(2266112K), 0.0376384 secs] [Times: user=0.25 sys=0.02, real=0.04 secs]
31.621: [GC (Allocation Failure) [PSYoungGen: 1131085K->41607K(1708032K)] 1150980K->61510K(2797568K), 0.0426743 secs] [Times: user=0.19 sys=0.02, real=0.04 secs]
34.381: [GC (Allocation Failure) [PSYoungGen: 1702023K->52721K(1713152K)] 1721926K->75113K(2802688K), 0.0671733 secs] [Times: user=0.32 sys=0.06, real=0.07 secs]
544.663: [GC (Allocation Failure) [PSYoungGen: 1713137K->24633K(2550784K)] 1735529K->55561K(3640320K), 0.0502315 secs] [Times: user=0.37 sys=0.08, real=0.05 secs]
1744.725: [GC (Allocation Failure) [PSYoungGen: 2550329K->6583K(2657792K)] 2581257K->37803K(3747328K), 0.0109360 secs] [Times: user=0.06 sys=0.05, real=0.01 secs]
3364.582: [GC (Allocation Failure) [PSYoungGen: 2603959K->7333K(2513408K)] 2635179K->38561K(3602944K), 0.0106033 secs] [Times: user=0.05 sys=0.05, real=0.01 secs]
4564.508: [GC (Allocation Failure) [PSYoungGen: 2513061K->7397K(2425856K)] 2544289K->38633K(3515392K), 0.0098975 secs] [Times: user=0.05 sys=0.05, real=0.01 secs]
5944.468: [GC (Allocation Failure) [PSYoungGen: 2425573K->7432K(2342400K)] 2456809K->38676K(3431936K), 0.0100541 secs] [Times: user=0.05 sys=0.04, real=0.01 secs]
6904.427: [GC (Allocation Failure) [PSYoungGen: 2342152K->7814K(2263040K)] 2373396K->39065K(3352576K), 0.0100246 secs] [Times: user=0.06 sys=0.05, real=0.01 secs]
7624.583: [GC (Allocation Failure) [PSYoungGen: 2262662K->7335K(2186240K)] 2293913K->38595K(3275776K), 0.0126832 secs] [Times: user=0.07 sys=0.03, real=0.01 secs]
8524.740: [GC (Allocation Failure) [PSYoungGen: 2185895K->7238K(2113536K)] 2217155K->38505K(3203072K), 0.0110849 secs] [Times: user=0.06 sys=0.05, real=0.01 secs]
9604.461: [GC (Allocation Failure) [PSYoungGen: 2113094K->7415K(2043904K)] 2144361K->38690K(3133440K), 0.0187939 secs] [Times: user=0.11 sys=0.07, real=0.02 secs]
10864.545: [GC (Allocation Failure) [PSYoungGen: 2043639K->7287K(1977344K)] 2074914K->38570K(3066880K), 0.0131232 secs] [Times: user=0.12 sys=0.04, real=0.01 secs] From: yarn-timelineserver-gc.log 2019-10-28 14:35:06,753 WARN timeline.EntityGroupFSTimelineStore (LogInfo.java:doParse(208)) - Error putting entity: dag_1572286585508_0006_331 (TEZ_DAG_ID): 6
2019-10-28 14:35:06,753 WARN timeline.EntityGroupFSTimelineStore (LogInfo.java:doParse(208)) - Error putting entity: dag_1572286585508_0006_332 (TEZ_DAG_ID): 6
2019-10-28 14:35:06,754 WARN timeline.EntityGroupFSTimelineStore (LogInfo.java:doParse(208)) - Error putting entity: dag_1572286585508_0006_332 (TEZ_DAG_ID): 6
2019-10-28 14:35:06,754 WARN timeline.EntityGroupFSTimelineStore (LogInfo.java:doParse(208)) - Error putting entity: dag_1572286585508_0006_333 (TEZ_DAG_ID): 6
2019-10-28 14:35:06,754 WARN timeline.EntityGroupFSTimelineStore (LogInfo.java:doParse(208)) - Error putting entity: dag_1572286585508_0006_333 (TEZ_DAG_ID): 6
2019-10-28 14:35:06,755 WARN timeline.EntityGroupFSTimelineStore (LogInfo.java:doParse(208)) - Error putting entity: dag_1572286585508_0006_334 (TEZ_DAG_ID): 6
2019-10-28 14:35:06,755 WARN timeline.EntityGroupFSTimelineStore (LogInfo.java:doParse(208)) - Error putting entity: dag_1572286585508_0006_334 (TEZ_DAG_ID): 6
2019-10-28 14:35:06,755 INFO timeline.LogInfo (LogInfo.java:parseForStore(116)) - Parsed 1338 entities from hdfs://hdpnndev/ats/active/application_1572286585508_0006/appattempt_1572286585508_0006_000001/summarylog-appattempt_1572286585508_0006_000001 in 314 msec From: hadoop-yarn-resourcemanager-server.log TARGET=ClientRMService RESULT=SUCCESS
2019-10-28 14:36:08,043 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1572286585508_0005_000001 container=null queue=batchq1 clusterResource=<memory:411648, vCores:128> type=RACK_LOCAL requestedPartition=
2019-10-28 14:36:08,043 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e247_1572286585508_0005_01_000398 Container Transitioned from NEW to ALLOCATED
2019-10-28 14:36:08,043 INFO fica.FiCaSchedulerNode (FiCaSchedulerNode.java:allocateContainer(169)) - Assigned container container_e247_1572286585508_0005_01_000398 of capacity <memory:3072, vCores:1> on host server:45454, which has 6 containers, <memory:70656, vCores:6> used and <memory:32256, vCores:26> available after allocation
2019-10-28 14:36:08,043 INFO resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(200)) - USER=hive OPERATION=AM Allocated Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1572286585508_0005 CONTAINERID=container_e247_1572286585508_0005_01_000398 RESOURCE=<memory:3072, vCores:1>
2019-10-28 14:36:08,043 INFO capacity.ParentQueue (ParentQueue.java:apply(1336)) - assignedContainer queue=batch usedCapacity=0.14925392 absoluteUsedCapacity=0.11940298 used=<memory:49152, vCores:13> cluster=<memory:411648, vCores:128>
2019-10-28 14:36:08,043 INFO capacity.ParentQueue (ParentQueue.java:apply(1336)) - assignedContainer queue=root usedCapacity=0.40298507 absoluteUsedCapacity=0.40298507 used=<memory:165888, vCores:17> cluster=<memory:411648, vCores:128>
2019-10-28 14:36:08,043 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2900)) - Allocation proposal accepted
2019-10-28 14:36:08,103 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e247_1572286585508_0005_01_000398 Container Transitioned from ALLOCATED to ACQUIRED
2019-10-28 14:36:08,300 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1572286585508_0006_000001 container=null queue=batchq1 clusterResource=<memory:411648, vCores:128> type=OFF_SWITCH requestedPartition=
2019-10-28 14:36:08,300 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e247_1572286585508_0006_01_000647 Container Transitioned from NEW to ALLOCATED
2019-10-28 14:36:08,300 INFO fica.FiCaSchedulerNode (FiCaSchedulerNode.java:allocateContainer(169)) - Assigned container container_e247_1572286585508_0006_01_000647 of capacity <memory:3072, vCores:1> on host server:45454, which has 5 containers, <memory:18432, vCores:5> used and <memory:84480, vCores:27> available after allocation
2019-10-28 14:36:08,300 INFO resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(200)) - USER=hive OPERATION=AM Allocated Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1572286585508_0006 CONTAINERID=container_e247_1572286585508_0006_01_000647 RESOURCE=<memory:3072, vCores:1>
2019-10-28 14:36:08,300 INFO capacity.ParentQueue (ParentQueue.java:apply(1336)) - assignedContainer queue=batch usedCapacity=0.15858229 absoluteUsedCapacity=0.12686567 used=<memory:52224, vCores:14> cluster=<memory:411648, vCores:128>
2019-10-28 14:36:08,300 INFO capacity.ParentQueue (ParentQueue.java:apply(1336)) - assignedContainer queue=root usedCapacity=0.41044775 absoluteUsedCapacity=0.41044775 used=<memory:168960, vCores:18> cluster=<memory:411648, vCores:128>
2019-10-28 14:36:08,300 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2900)) - Allocation proposal accepted
2019-10-28 14:36:08,354 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e247_1572286585508_0005_01_000398 Container Transitioned from ACQUIRED to RELEASED
2019-10-28 14:36:08,354 INFO resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(200)) - USER=hive IP=10.10.81.14 OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1572286585508_0005 CONTAINERID=container_e247_1572286585508_0005_01_000398 RESOURCE=<memory:3072, vCores:1>
2019-10-28 14:36:08,354 INFO scheduler.AppSchedulingInfo (AppSchedulingInfo.java:updatePendingResources(367)) - checking for deactivate of application :application_1572286585508_0005
2019-10-28 14:36:08,485 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e247_1572286585508_0006_01_000647 Container Transitioned from ALLOCATED to ACQUIRED
2019-10-28 14:36:08,736 INFO scheduler.AppSchedulingInfo (AppSchedulingInfo.java:updatePendingResources(367)) - checking for deactivate of application :application_1572286585508_0006
2019-10-28 14:36:08,987 INFO scheduler.AppSchedulingInfo (AppSchedulingInfo.java:updatePendingResources(367)) - checking for deactivate of application :application_1572286585508_0006 This are not the complete logs, just a glimpse. I hope it helps to come up with any idea. It gives me the impression it's heap memory issue. But... AppTimelineServer Java heap size = 8G , therefore any thought is appreciated. Regards!
... View more
10-28-2019
12:32 PM
@MIkeL The best technical reference before you embark on deploying your cluster is to check the compatibility of the different moving parts of HDP/Cloudera binaries against an operating system of your choice, the first source of truth is please filter all the possible valid options using supportmatrix cloudera/hortonworks tool Hortonworks and Cloudera do run exhaustive tests on a particular Operating system before certifying it as production-ready and from the about RHEL/Centos 7.7 are not yet certified so I highly doubt whether RHEL/Centos 8 is certified that explains the Python errors you are encountering. HTH
... View more
10-28-2019
12:19 PM
I'm getting the same error. This is the response I received form Cloudera support "Only the dfs commands such as ls/put/mv etc works on wasb using the wasb connector. Admin commands such as dfsadmin as well fsck works only with native hadoop/hdfs implementation"
... View more
10-27-2019
04:02 AM
may I return to my first question until using redhat 7.2 , every thing was ok , after each scratch installation we never seen that but when we jump to redhat 7.5 then every cluster that created was with corrupted files - any HINT - why ?
... View more
10-26-2019
09:19 AM
@Atena-Dev-Team Any updates on this thread
... View more
10-25-2019
02:03 PM
@Anuj Here is the official steps from the Ambari.org read through and follow the steps look at my steps for checking the zookeeper entries Step-by-step guide Using Ambari Set AMS to maintenance Stop AMS from Ambari Identify the following from the AMS Configs screen 'Metrics Service operation mode' (embedded or distributed) hbase.rootdir hbase.zookeeper.property.dataDir AMS data would be stored in 'hbase.rootdir' identified above. Backup and remove the AMS data. If the Metrics Service operation mode is 'embedded', then the data is stored in OS files. Use regular OS commands to backup and remove the files in hbase.rootdir is 'distributed', then the data is stored in HDFS. Use 'hdfs dfs' commands to backup and remove the files in hbase.rootdir Remove the AMS zookeeper data by backing up and removing the contents of 'hbase.tmp.dir'/zookeeper Remove any Phoenix spool files from 'hbase.tmp.dir'/phoenix-spool folder Restart AMS using Ambari I take the above a step further by locating the zookeeper executable usually in /usr/hdp/{hdp_version}/zookeeper/bin/ Log into zookeeper [zookeeper@osaka bin]$ ./zkCli.sh List the root leaf structure you should see ambari-metrics-cluster should look like below [zk: localhost:2181(CONNECTED) 0] ls / [cluster, registry, controller, brokers, storm, zookeeper, infra-solr, hbase-unsecure, admin, isr_change_notification, log_dir_event_notificat ion, controller_epoch, hiveserver2, hiveserver2-leader, rmstore, atsv2-hbase-unsecure, consumers, ambari-metrics-cluster, latest_producer_id_b lock, config] Now check the entries under ambari-metrics-cluster, you should find something like below ls /ambari-metrics-cluster/INSTANCES/ FQDN_12001 Delete the entry that corresponds to your cluster [zk: localhost:2181(CONNECTED) 25] rmr /ambari-metrics-cluster/INSTANCES/FQDN_12001 Restart the AM this should recreate a new entry in zookeeper
... View more
10-24-2019
12:55 AM
actually in oracle Database previously am not entry my user id and password after entry the same connection is established. It's a minor issue thanks for your support.
... View more
10-23-2019
11:30 PM
@paras Please check the above curl O/P
... View more
10-22-2019
08:11 PM
I would suggest you to go through the below docs and verify the outbound rules on port 7180. https://docs.aws.amazon.com/vpc/latest/userguide/vpc-network-acls.html
... View more
10-18-2019
07:43 PM
Perfect <3. I was misunderstading about the hostname property in Ambari Agent ini, due to I modified it such as each node is each own ambari server. There is also have one error in mysql connector jar missing (even I am using postgresql), I also add my sql connector jar in resource file of Ambari server on uvmu01 host. Thank you so much.
... View more