Member since
01-25-2017
25
Posts
4
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2517 | 03-27-2017 07:57 AM |
05-10-2017
07:32 PM
Thank you very much for the answer 🙂
... View more
05-10-2017
06:42 AM
Thanks for the reply. But, as per my understanding, one way trust means, local KDC will trust the tickets generated by central AD. Correct? If the linux servers where Hadoop runs cannot contact AD for Kerberos, or if we have just an LDAP server in place of AD, how the Kerberos auth happens? Will I have to create user principals manually in local KDC?
... View more
05-08-2017
06:43 PM
@slachterman thanks for your reply. Doesn't support means, the authentication agent used in linux systems is not able to connect to AD Kerberos ports and there are policy wise restrictions in using Kerberos on linux boxes and I know it's bit weird. So, I was thinking about some solution that doesn't depend on KDC services by AD but still having Kerberos enabled on cluster. For example, suppose we have an AD and a local MIT-KDC. Local KDC hosts service principals. AD manages cluster users. If there exists a user - A in AD, and I create a user principal A@MIT-KDC-Realm, and if there's one way trust established between AD and KDC, will user A be able to successfully launch jobs and use services on cluster ?
... View more
05-08-2017
02:39 PM
1 Kudo
Hi, In my environment, Hadoop nodes are integrated with AD for authentication. AD doesn't support Kerberos. I understand that it's possible to have users + user principals serviced by AD and have only service principals serviced by a local KDC. Question is, is it possible to set up a local KDC server for both service and user principals but actual users will reside in AD ? So, I will need to host kerberos principals and manage tickets of AD users in local KDC. AD user realm and KDC realm will also be different. Any help would be appreciated 🙂
... View more
04-10-2017
11:53 AM
Looks like this is a better approach. I got some clear info from http://theckang.com/2015/remote-spark-jobs-on-yarn/ that matches your solution. Thanks much !
... View more
04-05-2017
08:22 PM
@Michael M That's cool !So, this set up needs spark version > 2 ? Also, what would be the master ip and port if using Spark on YARN ? I am not a dev, so please excuse if these sound stupid 😄
... View more
04-05-2017
08:18 PM
1 Kudo
Livy is a nice option, just that we will have to make curl calls to API outside the script(?). But, something like what @Michael M sounds more interesting.
... View more
04-03-2017
07:00 PM
@Kshitij Badani thanks for the reply. Forgot to mention, I am using Zeppelin and Jupyter right now. But, an IDE is more featureful and best suited in scenarios like module building. I have seen people using Spyder, pyCharm, Eclipse etc locally, but was looking to see if they could be integrated with remote multi-node Hadoop cluster.
... View more
04-03-2017
02:30 PM
Has anyone ever used any python IDEs on Spark cluster ? Is there any way someone can install some python IDEs like Eclipse, Spyder etc on local windows machine and to submit spark jobs on a remote cluster via pyspark ? I could see that Spyder is available with Anaconda, but hadoop nodes where Anaconda is installed don't have GUI tools and it's not possible to see Spider UI that is initialized on remote linux edge node. Which is the best way to go about this ?
... View more
Labels:
- Labels:
-
Apache Spark
03-27-2017
09:01 AM
@ccasano I set up the queues like above. Say I have 4 queues Q1 to Q4 with min 25% and max 100%. If I start a job on Q1 and it goes up to 100% utilization and later if I launch the same task on Q2, the new task will grow only up to 25% (Absolute configured capacity) and the old one will come back to 75%. Is there a way I can equally distribute the resources here ? ie, the second job should grow beyond its minimum capacity until the queue are balanced equally. Thanks in advance !
... View more
03-27-2017
07:57 AM
After setting below 2 parameters on custom yarn-site.xml, things started working.
yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor
... View more
03-21-2017
10:36 PM
Thanks @Michael Young for your answer. But, I don't think that's how it works. As per my understanding, the case you talked about is when preemption is disabled - ie, new tasks have to wait until the existing ones are finished. New ones cannot start if the available resources is less than the minimum requirement. I think, the whole point of preemption is to avoid this scenario by forcefully killing containers held by existing jobs from over utilized queues if they're not willing to release resources in 'x' amount of time. Please see here , STEP #3 reads "such containers will be forcefully killed by the ResourceManager to ensure that SLAs of applications in under-satisfied queues are met". To answer your other question, I have 4 queues, Q1 to Q4, each has 25% min capacity and 100% max capacity. Q2 is divided into Q21 and Q22 with 50%(min) each. All of them uses FIFO.
... View more
03-21-2017
12:41 PM
1 Kudo
Hi there,
I have enabled preemption for YARN as per : https://hortonworks.com/blog/better-slas-via-resource-preemption-in-yarns-capacityscheduler/
I observed that if the queues are 100% occupied by Hive (TEZ with container reuse enabled) or Spark jobs already and if a new job is submitted to any queue, it will not start until any of the existing tasks finish. At the same time if I try to launch hive cli, it will also hang forever until some tasks are finished and resources are deallocated.
If TEZ container reuse is disabled, new jobs will start getting resources - this is not because of preemption, but each container will last only for a few secs and the new containers will go to new jobs. Spark is anyway not touched - it will not release any resources.
Anyone has any hint as to why preemption is not happening ? Also, how to preempt spark jobs ?
Values are as follows -
yarn.resourcemanager.scheduler.monitor.enable = true
yarn.resourcemanager.scheduler.monitor.policies = org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval = 3000
yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill = 15000
yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round = 0.1
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
02-27-2017
01:42 PM
This fixed my problem. I am on HDP 2.5.0 🙂
... View more
02-03-2017
09:09 AM
>> In the YARN page, the cpu_wio._avg is the average metric value for all nodes in the YARN cluster (nodemanagers). The cpu_wio._max is the maximum of the all the cpu_wio values from the YARN cluster. This was my understanding. But, see below - >> You can use the "System Servers" Grafana dashboard to delve deeper to check why higher values are seen in the graph. This metric is being captured in the "CPU - IOWAIT/INTR" section in that dashboard. The value shown here doesn't match with the one I see on Ambari dashboard. If the max value for io wait among all the nodemanagers is 5% on grafana, then the max value shown on Ambari dashboard is different, may be 10 or 20 or even 40. Another crazy thing I noted is, when we zoom in a portion of graph in Grafana, the % value changes. Not sure if it's the same for all.
... View more
02-02-2017
09:46 AM
@Aravindan Vijayan Thanks for your reply. I am attaching the metrics details below. cpu_wio._avg and cpu_wio._max are the metrics here. Also, I don't think the values shown there are the actual cpu io wait, rather it's multiplied/added with some x factor. You can see this graph in Ambari Dashboard > Yarn. Under the metrics graphs, you will see one like in the above preview image.
... View more
01-31-2017
02:37 PM
Yes, I understand the concept. Just confused on how it's calculated. >>Which screen are you reporting these numbers from? Is this from a specific host page, or an overall metric? Actually there's a "CPU Wait Access" widget available on "YARN"page in Ambari. I have 8 worker (data) nodes 6 master nodes. What I can see is, if the maximum I/O wait on any of the worker nodes is 2% (shown in host specific metrics/grafana/nagios etc), then the CPU Wait Access reports a peak of 40%. Same applies for any number.
... View more
01-31-2017
01:31 PM
Hey, Anyone knows what does "CPU Wait Access" mean on Ambari YARN metrics and how it's calculated ? I understand that it's related to CPU IO. But looks like the actual (max) CPU IO on workers is multiplied by ~20 and the resulting value is being shown as CPU Wait Access in my case. Any better Idea ? Thanks.
... View more
- Tags:
- Ambari
- ambari-metrics
- Data Processing
- Upgrade to HDP 2.5.3 : ConcurrentModificationException When Executing Insert Overwrite : Hive
Labels:
- Labels:
-
Apache Ambari
01-27-2017
08:59 AM
1 Kudo
@gnovak I completely figured out the issue - not sure if I can call it an issue ! It was the "user-limit-factor". In my case, each queue is used by only one user. My assumption was that, if min capacity of a sub leaf (Q41) is 25% and it can grow up to 100% of its parent queue - Q4, then the max user-limit-factor value Q41 can have would be 4 (4*25=100%). But this is not true ! It can grow beyond that - until the absolute max configured capacity ! So the math is : max (user-limit-factor) = absolute max configured capacity / absolute configured capacity. Absolute values we can find from the Scheduler part in resource manager UI. Once I adjusted the user-limit-factor to take benefit of the whole capacity by a single user, problem solved ! Thanks for your spark though !
... View more
01-26-2017
07:11 PM
@gnovak Perfect illustration, this kind of doc is not available on internet, wish Hortonworks pin it somewhere 🙂 In your case, was the user limit factor set to 1 ? I also suspect the apps as to why they were not requesting more capacity. In my case, the workload was different. Q1 and Q2 had 1 app each with less number of containers and large amount of resources. Meanwhile, Q41 had one app with more number of containers but with minimum resources ( containers with min configured memory and vcores in yarn ). Anyway, I'll investigate more by pushing the same load to all queues simultaneously and see. Thank you for your time, much appreciated 🙂 !
... View more
01-25-2017
08:11 PM
http://hortonworks.com/blog/better-slas-via-resource-preemption-in-yarns-capacityscheduler/ This doc says - "preemption works in conjunction with the scheduling flow to make sure that resources freed up at any level in the hierarchy are given back to the right queues in the right level".
... View more
01-25-2017
07:48 PM
This could partially explain the reason, thanks for the spark. But, I would still expect, in a FIFO queue, resources are given in a round robin manner according to the demand. Then also, there should be more civilized/balanced distribution of resources across same level queues and there by the sub leafs getting a fair portion. confusing ! 😞
... View more
01-25-2017
05:37 PM
Thanks @Jasper for your reply. But pre-emption is enabled. I can confirm that because YARN jobs spawned under those queues say "Pre-emption enabled" in resource manager. "I don't get why Q41 is only getting 10% and not 20%." ^ Actually I was talking about the absolute capacity - so it's calculated as 25%40 = 10% Absolute. So the minimum is satisfied. Excess resources are then moved to the queues one level above (Q1, Q2 & Q3). So, it seems to me like, Queues at a certain level have got more priority than their underlying subleafs. Meaning, if the minimum capacity is satisfied for subleafs, then resource manager puts its parent in a wait list and allocates more resources to other queues of same level as the parent. This is what I observed, it doesn't make sense though !
... View more
01-25-2017
02:22 PM
Hi, I'm stuck with a problem and would be really great if someone could help me ! I'm running an HDP 2.5.0.0 cluster. Capacity scheduler is the scheduler used. Let's say I have 4 queues - Q1, Q2, Q3 and Q4 defined under root. Q1,Q2 and Q3 are leaf queues and have minimum and maximum capacities 20% and 40% respectively (queues are similar). Q4 is a parent queue (minimum cap - 40%, max - 100%) and has 4 leaf queues under it - let's say Q41, Q42, Q43 and Q44 (minimum 25, maximum 100 for all 4 sub queues) . All queues have minimum user limit set to 100% and user limit factor set to 1. Issue : When users submit jobs to Q1,Q2 and Q41 and if other queues are empty, I would expect Q1 and Q2 should be at 20% + absolute capacity and Q4 should be 40% +, roughly 25 (Q1), 25 (Q2) and 50 (Q41). But this is not happening. Q1 and Q2 always stay at 40% and Q41 or Q4 is getting only 10% absolute capacity. Any idea how it's happening ? Thank.
... View more
Labels:
- Labels:
-
Apache YARN