About Adija1

Adija1 · ‎07-08-2018

@Ilia K In your case i would suggest using the following configuration: Dev queue: Capacity 30% Max Capacity 70%, User limit Factor: 4, Oredering policy: Fair Prod queue: Capacity 70% Max Capacity 30%, User limit Factor 2, Oredering policy: Fair Make sure preemption is enabled in hive config. These configs should give you the desired result - thus enabling each queue having 100% when the other queue is idle. The trick is with "User Limit Factor" - which enables the dev queue to "steal" resources from the Prod queue up to 4 times of it's configured capacity (thus resulting in 100% percent in DEV while Prod is idle).

Adija1 · ‎06-26-2018

@Jay Kumar SenSharma The old hosts are still in the AMS metrics. I will take care of the TTL in AMS as per your recommendations. Thank you !

Adija1 · ‎06-26-2018

@Jay Kumar SenSharma Thank you for your promptly response ! I was refering to hosts that were deleted months ago that are still there... So it seems that Grafana isn't aware to the change....

Adija1 · ‎06-26-2018

Hello all I've noticed that in the Grafana UI when i filter for specific servers i can still see in the list Hosts that are no longer a part of the cluster (Datanodes that were deleted completely from the cluster). How can i remove them from Grafana UI ? Adi

Adija1 · ‎06-25-2018

Hi @Eric Leme Thank you for your response. I'm familiar with the information you provided (yet it is appreciated... points rewarded...) but i was hoping to hear from people who actually did the switch from 1500 to Jumbo. Learn from their first hand experience. Thanks!

Adija1 · ‎06-05-2018

Hello all I know that the official recommendation is to use MTU 9000 Jumbo Frames in all the network interfaces when it comes to HDP cluster. My question is for those here that switched their cluster from 1500 to 9000 - have you seen an increase in performance due to the change ? Thanks in advance! Adi

Adija1 · ‎06-04-2018

Thank you @Geoffrey Shelton Okot However the fsck shows no corrupt blocks. The problem is with corrupt replicas. That said - after alert disappeared ... Not sure if to be glad or suspicious 🙂 Adi

Adija1 · ‎06-03-2018

Hello I've noticed in Ambari under HDFS metrics that we have 2 blocks with corrupt replicas. Running " hdfs fsck / " shows no corrupt blocks and system is healthy. Running "hdfs dfsadmin-report" shows 2 corrupt replicas (same as Ambari dashboard) I've restarted Ambari metrics & Ambari Agents on all nodes + Ambari-server as noted in one of the threads i came across but still - problem remains. Ambari is 2.5.2 Any ideas how to fix this issue ? Thanks Adi

Adija1 · ‎10-03-2017

Problem solved! In case anyone else encounters the following: 1. Servers losing heartbeat for no reason 2. ambari agent always hogs 100% cpu 3. running "yarn application -list" produces results but slow than other servers. 4. In general the server is slow The fix was to set the CPU in BIOS to use profile "maximum performance" For some reason the server was set (in our case DELL, but it is in every BIOS out there) to default CPU profile which means low resources for low voltage use.

Adija1 · ‎10-01-2017

More info: On ambari-server.log i can see that it acknowledges the loss of heartbeat but it seems they do communicated: 01 Oct 2017 18:22:39,870 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:159 - Heartbeat lost from host hdp-dn01-drp.hadoop.local 01 Oct 2017 18:22:39,872 INFO [ambari-hearbeat-monitor] TopologyManager:671 - Hearbeat for host hdp-dn01-drp.hadoop.local lost thus removing it from available hosts. 01 Oct 2017 18:22:39,872 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:174 - Setting component state to UNKNOWN for component METRICS_MONITOR on hdp-dn01-drp.hadoop.local 01 Oct 2017 18:22:39,872 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:174 - Setting component state to UNKNOWN for component DRUID_MIDDLEMANAGER on hdp-dn01-drp.hadoop.local 01 Oct 2017 18:22:39,872 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:174 - Setting component state to UNKNOWN for component DRUID_HISTORICAL on hdp-dn01-drp.hadoop.local 01 Oct 2017 18:22:39,872 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:174 - Setting component state to UNKNOWN for component DATANODE on hdp-dn01-drp.hadoop.local 01 Oct 2017 18:22:39,873 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:174 - Setting component state to UNKNOWN for component NODEMANAGER on hdp-dn01-drp.hadoop.local 01 Oct 2017 18:23:23,042 WARN [qtp-ambari-agent-1563184] HeartBeatHandler:235 - Host is in HEARTBEAT_LOST state - sending register command 01 Oct 2017 18:23:30,029 INFO [qtp-ambari-agent-1563184] HeartBeatHandler:425 - agentOsType = centos6 01 Oct 2017 18:23:30,045 INFO [qtp-ambari-agent-1563184] HostImpl:329 - Received host registration, host=[hostname=hdp-dn01-drp,fqdn=hdp-dn01-drp.hadoop.local,domain=hadoop.local,architecture=x86_64,processorcount=12,physicalprocessorcount=12,osname=centos,osversion=6.8,osfamily=redhat,memory=65901056,uptime_hours=0,mounts=(available=38553776,mountpoint=/,used=10299852,percent=22%,size=51475068,device=/dev/mapper/vg_system-LogVol00,type=ext4)(available=390059,mountpoint=/boot,used=71993,percent=16%,size=487652,device=/dev/sda2,type=ext4)(available=204304,mountpoint=/boot/efi,used=276,percent=1%,size=204580,device=/dev/sda1,type=vfat)(available=1654503504,mountpoint=/grid/0,used=110677756,percent=7%,size=1859652852,device=/dev/sda5,type=ext4)(available=1713507724,mountpoint=/grid/1,used=111041964,percent=7%,size=1922198324,device=/dev/sdb1,type=ext4)(available=1716047400,mountpoint=/grid/2,used=108502288,percent=6%,size=1922198324,device=/dev/sdc1,type=ext4)(available=1716667196,mountpoint=/grid/3,used=107882492,percent=6%,size=1922198324,device=/dev/sdd1,type=ext4)(available=1709492804,mountpoint=/grid/4,used=115056884,percent=7%,size=1922198324,device=/dev/sde1,type=ext4)(available=1710666700,mountpoint=/grid/5,used=113882988,percent=7%,size=1922198324,device=/dev/sdf1,type=ext4)(available=1709508880,mountpoint=/grid/6,used=115040808,percent=7%,size=1922198324,device=/dev/sdg1,type=ext4)(available=1705253584,mountpoint=/grid/7,used=119296104,percent=7%,size=1922198324,device=/dev/sdh1,type=ext4)(available=1708647680,mountpoint=/grid/8,used=115902008,percent=7%,size=1922198324,device=/dev/sdi1,type=ext4)(available=1713886116,mountpoint=/grid/9,used=110663572,percent=7%,size=1922198324,device=/dev/sdj1,type=ext4)(available=1711301604,mountpoint=/grid/10,used=113248084,percent=7%,size=1922198324,device=/dev/sdk1,type=ext4)(available=1712490508,mountpoint=/grid/11,used=112059180,percent=7%,size=1922198324,device=/dev/sdl1,type=ext4)] , registrationTime=1506871410029, agentVersion=2.5.1.0 01 Oct 2017 18:23:30,045 INFO [qtp-ambari-agent-1563184] TopologyManager:592 - TopologyManager.onHostRegistered: Entering 01 Oct 2017 18:23:30,045 INFO [qtp-ambari-agent-1563184] TopologyManager:594 - TopologyManager.onHostRegistered: host = hdp-dn01-drp.hadoop.local is already associated with the cluster or is currently being processed 01 Oct 2017 18:23:30,052 INFO [qtp-ambari-agent-1563184] HeartBeatHandler:504 - Recovery configuration set to RecoveryConfig{, type=AUTO_START, maxCount=6, windowInMinutes=60, retryGap=5, maxLifetimeCount=1024, components=null, recoveryTimestamp=1506871410051}

Online	Offline
Last Visited	‎09-20-2018 02:06 PM

Member Since	‎02-04-2016 06:49 AM
Last Visited	‎09-20-2018 02:06 PM
Posts	132
Kudos received	52

Cloudera Community

Re: Connection failed to DataNode:50075 sometimes

Re: decommision datanode and keep other service

Re: Ambari-Agent high cpu & Datanode without heart...

Re: Permissions problem in Capacity Scheduler view...

Re: Ranger Audit stopped working after server rebo...

Re: Yarn Capacity Scheduler: Share resource betwee...

Re: Deleted Hosts still appear in Grafana

Re: Deleted Hosts still appear in Grafana

Deleted Hosts still appear in Grafana

Re: Jumbo Frames (MTU = 9000) - performance increa...

Jumbo Frames (MTU = 9000) - performance increase ?

Re: Blocks with corrupted replicas

Blocks with corrupted replicas

Re: Ambari-Agent high cpu & Datanode without heart...

Re: Ambari-Agent high cpu & Datanode without heart...