Member since
02-04-2016
132
Posts
52
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6231 | 07-25-2018 10:53 AM | |
1764 | 07-25-2018 05:15 AM | |
1828 | 10-03-2017 12:08 PM | |
3096 | 04-04-2017 05:36 AM | |
3335 | 11-29-2016 05:40 PM |
07-08-2018
09:19 AM
@Ilia K In your case i would suggest using the following configuration: Dev queue: Capacity 30% Max Capacity 70%, User limit Factor: 4, Oredering policy: Fair Prod queue: Capacity 70% Max Capacity 30%, User limit Factor 2, Oredering policy: Fair Make sure preemption is enabled in hive config. These configs should give you the desired result - thus enabling each queue having 100% when the other queue is idle. The trick is with "User Limit Factor" - which enables the dev queue to "steal" resources from the Prod queue up to 4 times of it's configured capacity (thus resulting in 100% percent in DEV while Prod is idle).
... View more
06-26-2018
06:24 AM
@Jay Kumar SenSharma The old hosts are still in the AMS metrics. I will take care of the TTL in AMS as per your recommendations. Thank you !
... View more
06-26-2018
05:16 AM
@Jay Kumar SenSharma Thank you for your promptly response ! I was refering to hosts that were deleted months ago that are still there... So it seems that Grafana isn't aware to the change....
... View more
06-26-2018
05:09 AM
Hello all I've noticed that in the Grafana UI when i filter for specific servers i can still see in the list Hosts that are no longer a part of the cluster (Datanodes that were deleted completely from the cluster). How can i remove them from Grafana UI ? Adi
... View more
06-25-2018
10:16 AM
Hi @Eric Leme Thank you for your response. I'm familiar with the information you provided (yet it is appreciated... points rewarded...) but i was hoping to hear from people who actually did the switch from 1500 to Jumbo. Learn from their first hand experience. Thanks!
... View more
06-05-2018
08:46 AM
Hello all
I know that the official recommendation is to use MTU 9000 Jumbo Frames in all the network interfaces when it comes to HDP cluster.
My question is for those here that switched their cluster from 1500 to 9000 - have you seen an increase in performance due to the change ? Thanks in advance!
Adi
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
06-04-2018
05:34 AM
Thank you @Geoffrey Shelton Okot However the fsck shows no corrupt blocks. The problem is with corrupt replicas. That said - after alert disappeared ... Not sure if to be glad or suspicious 🙂 Adi
... View more
06-03-2018
03:36 PM
Hello I've noticed in Ambari under HDFS metrics that we have 2 blocks with corrupt replicas. Running " hdfs fsck / " shows no corrupt blocks and system is healthy. Running "hdfs dfsadmin-report" shows 2 corrupt replicas (same as Ambari dashboard) I've restarted Ambari metrics & Ambari Agents on all nodes + Ambari-server as noted in one of the threads i came across but still - problem remains. Ambari is 2.5.2 Any ideas how to fix this issue ? Thanks Adi
... View more
Labels:
- Labels:
-
Apache Hadoop
10-03-2017
12:08 PM
Problem solved! In case anyone else encounters the following: 1. Servers losing heartbeat for no reason 2. ambari agent always hogs 100% cpu 3. running "yarn application -list" produces results but slow than other servers. 4. In general the server is slow The fix was to set the CPU in BIOS to use profile "maximum performance" For some reason the server was set (in our case DELL, but it is in every BIOS out there) to default CPU profile which means low resources for low voltage use.
... View more
10-01-2017
03:27 PM
More info: On ambari-server.log i can see that it acknowledges the loss of heartbeat but it seems they do communicated: 01 Oct 2017 18:22:39,870 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:159 - Heartbeat lost from host hdp-dn01-drp.hadoop.local
01 Oct 2017 18:22:39,872 INFO [ambari-hearbeat-monitor] TopologyManager:671 - Hearbeat for host hdp-dn01-drp.hadoop.local lost thus removing it from available hosts.
01 Oct 2017 18:22:39,872 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:174 - Setting component state to UNKNOWN for component METRICS_MONITOR on hdp-dn01-drp.hadoop.local
01 Oct 2017 18:22:39,872 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:174 - Setting component state to UNKNOWN for component DRUID_MIDDLEMANAGER on hdp-dn01-drp.hadoop.local
01 Oct 2017 18:22:39,872 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:174 - Setting component state to UNKNOWN for component DRUID_HISTORICAL on hdp-dn01-drp.hadoop.local
01 Oct 2017 18:22:39,872 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:174 - Setting component state to UNKNOWN for component DATANODE on hdp-dn01-drp.hadoop.local
01 Oct 2017 18:22:39,873 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:174 - Setting component state to UNKNOWN for component NODEMANAGER on hdp-dn01-drp.hadoop.local
01 Oct 2017 18:23:23,042 WARN [qtp-ambari-agent-1563184] HeartBeatHandler:235 - Host is in HEARTBEAT_LOST state - sending register command
01 Oct 2017 18:23:30,029 INFO [qtp-ambari-agent-1563184] HeartBeatHandler:425 - agentOsType = centos6
01 Oct 2017 18:23:30,045 INFO [qtp-ambari-agent-1563184] HostImpl:329 - Received host registration, host=[hostname=hdp-dn01-drp,fqdn=hdp-dn01-drp.hadoop.local,domain=hadoop.local,architecture=x86_64,processorcount=12,physicalprocessorcount=12,osname=centos,osversion=6.8,osfamily=redhat,memory=65901056,uptime_hours=0,mounts=(available=38553776,mountpoint=/,used=10299852,percent=22%,size=51475068,device=/dev/mapper/vg_system-LogVol00,type=ext4)(available=390059,mountpoint=/boot,used=71993,percent=16%,size=487652,device=/dev/sda2,type=ext4)(available=204304,mountpoint=/boot/efi,used=276,percent=1%,size=204580,device=/dev/sda1,type=vfat)(available=1654503504,mountpoint=/grid/0,used=110677756,percent=7%,size=1859652852,device=/dev/sda5,type=ext4)(available=1713507724,mountpoint=/grid/1,used=111041964,percent=7%,size=1922198324,device=/dev/sdb1,type=ext4)(available=1716047400,mountpoint=/grid/2,used=108502288,percent=6%,size=1922198324,device=/dev/sdc1,type=ext4)(available=1716667196,mountpoint=/grid/3,used=107882492,percent=6%,size=1922198324,device=/dev/sdd1,type=ext4)(available=1709492804,mountpoint=/grid/4,used=115056884,percent=7%,size=1922198324,device=/dev/sde1,type=ext4)(available=1710666700,mountpoint=/grid/5,used=113882988,percent=7%,size=1922198324,device=/dev/sdf1,type=ext4)(available=1709508880,mountpoint=/grid/6,used=115040808,percent=7%,size=1922198324,device=/dev/sdg1,type=ext4)(available=1705253584,mountpoint=/grid/7,used=119296104,percent=7%,size=1922198324,device=/dev/sdh1,type=ext4)(available=1708647680,mountpoint=/grid/8,used=115902008,percent=7%,size=1922198324,device=/dev/sdi1,type=ext4)(available=1713886116,mountpoint=/grid/9,used=110663572,percent=7%,size=1922198324,device=/dev/sdj1,type=ext4)(available=1711301604,mountpoint=/grid/10,used=113248084,percent=7%,size=1922198324,device=/dev/sdk1,type=ext4)(available=1712490508,mountpoint=/grid/11,used=112059180,percent=7%,size=1922198324,device=/dev/sdl1,type=ext4)]
, registrationTime=1506871410029, agentVersion=2.5.1.0
01 Oct 2017 18:23:30,045 INFO [qtp-ambari-agent-1563184] TopologyManager:592 - TopologyManager.onHostRegistered: Entering
01 Oct 2017 18:23:30,045 INFO [qtp-ambari-agent-1563184] TopologyManager:594 - TopologyManager.onHostRegistered: host = hdp-dn01-drp.hadoop.local is already associated with the cluster or is currently being processed
01 Oct 2017 18:23:30,052 INFO [qtp-ambari-agent-1563184] HeartBeatHandler:504 - Recovery configuration set to RecoveryConfig{, type=AUTO_START, maxCount=6, windowInMinutes=60, retryGap=5, maxLifetimeCount=1024, components=null, recoveryTimestamp=1506871410051}
... View more