Member since
07-27-2015
35
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
24751 | 04-06-2018 01:05 AM |
08-12-2015
10:40 PM
I have a 2 node cluster with each node having 8GB RAM and 4 cores. On both node, there are apps running having consumed 2 cores each. This leaves me with 2 cores (x 2) on both nodes. Memory used is 4GB of total 16GB available to YARN containers. Some important properties: yarn.nodemanager.resource.memory-mb = 20GB (overcomitted, as I see) yarn.scheduler.minimum-allocation-mb = 1GB yarn.scheduler.maximum-allocation-mb = 5.47GB yarn.nodemanager.resource.cpu-vcores = 12 yarn.scheduler.minimum-allocation-vcores = 1 yarn.scheduler.maximum-allocation-vcores = 12 Using Fair scheduler. With above setting. when I spark-submit the app remains in ACCEPTED state. Here is what I am requesting through this: spark.driver.memory=2G spark.master=yarn-client spark.executor.memory=1G num-executors = 2 executor-memory = 1G executor-cores = 1 As I see, I am requesting a total of 3 cores (1 for driver, by default and 1x2 for executors). A single node does not have 3 cores but has 2 cores. So, ideally I should see them distributed across 2 nodes. Not sure why the spark job should remain in ACCEPTED state. My default queue shows only 25% usage. I notice the following settings too for my default root.default queue: Used Capacity: 25.0% Used Resources: <memory:4096, vCores:4> Num Schedulable Applications: 2 Num Non-Schedulable Applications: 1 Num Containers: 4 Max Schedulable Applications: 2 Max Schedulable Applications Per User: 2 Why do I only get 4 containers in total? Or does it this indicate currently used containers (which in my case is 4)? Plus, why is max schedulable apps only 2? I have not set any user level limits or queue level limits under Dynamic Resource Pool settings.
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
08-10-2015
09:35 AM
I also think we can probably compress the binaries before being copied to HDFS and have YARN uncompress them somehow?
... View more
08-07-2015
08:16 PM
Are there any recommendations to speed up deployment of app binaries to YARN? I've been using RM REST APIs to submit apps to it with binaries located on HDFS. This tends to take a lot of time when the size of binaries to be deployed as YARN app are big in size (say, >500MB or more), and also when number of containers that I need are high. I could probably speed this up by : 1. Turning off default 3 copies needed on HDFS 2. Using HDFS cluster-wide cache which can help avoid block reads 3. Using YARN resource localization Do you have any recommendations which are definitely known to speed this up? Thanks, Sumit
... View more
Labels:
- Labels:
-
Apache YARN
-
HDFS
07-29-2015
09:41 PM
Does FairScheduler take only memory into consideration when making a decision or does it also use vcores? If it can depend upon multiple reasons, then again this may be another CR wherein user can get to know the exact reason (possibly through an API call) as to why an app is in ACCPETED state (such as memory, cores, disk space, queue limits, etc.)
... View more
07-29-2015
08:48 PM
You beat me to the answer 🙂 Yes, I figured this has to be set in NodeManager Advanced Configuration Snippet (Safety Valve) for yarn-site.xml. Thanks!
... View more
07-29-2015
08:43 PM
Thanks Wilfred. I'd agree about not setting it to false. That's my idea too. The main reason to use that setting is to be able to do some functional testing without getting into tuning as yet. So, is there a way I can set this property through UI?
... View more
07-28-2015
10:32 PM
On point 1, I think I am getting hit by https://issues.apache.org/jira/browse/YARN-3103
... View more
07-28-2015
06:08 AM
Hbase on Yarn. On a side note: 1. Is there a reason for security token to just fail like that after 15 mins of trying? Or I have some setup problem? That seems to be the first reason for attempt one to be killed. 2. The last line about null container - I see it often. Is that a bug? And can that be ignored? Thanks, Sumit
... View more
07-28-2015
03:55 AM
1 Kudo
Hi Harsh - You are right, there is a prior attempt which got killed. Here are some log snippets as you asked: Attempt 1 - app becomes RUNNING 2015-07-24 14:20:40,980 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1437726395811_0010_000001 State change from LAUNCHED to RUNNING 2015-07-24 14:20:40,981 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1437726395811_0010 State change from ACCEPTED to RUNNING Some hrs later the tokens are renewed (900000ms) 2015-07-25 13:56:35,841 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager: Rolling master-key for container-tokens 2015-07-25 13:56:35,841 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager: Going to activate master-key with key-id 1834122077 in 900000ms 2015-07-25 13:56:35,841 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Rolling master-key for nm-tokens 2015-07-25 13:56:35,842 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Going to activate master-key with key-id 516071750 in 900000ms The following 2 log lines keep repeating for the next 900000ms filling up logs: 2015-07-25 13:56:35,920 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Create AMRMToken for ApplicationAttempt: appattempt_1437726395811_0010_000001 2015-07-25 13:56:35,920 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Creating password for appattempt_1437726395811_0010_000001 ... ... 2015-07-25 14:11:35,772 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Create AMRMToken for ApplicationAttempt: appattempt_1437726395811_0010_00 0001 2015-07-25 14:11:35,772 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Creating password for appattempt_1437726395811_0010_000001 This fails (not sure why) and leads to app termination 2015-07-25 14:11:35,877 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8030: readAndProcess from client 10.65.144.85 threw exception [org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid AMRMToken from appattempt_1437726395811_0010_000001] 2015-07-25 14:11:36,888 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1437726395811_0010_01_000001 Container Transitioned from RUNNING to COMPLETED 1st attempt done (RUNNING --> ACCEPTED) 2015-07-25 14:11:36,888 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1437726395811_0010_000001 State change from RUNNING to FINAL_SAVING 2015-07-25 14:11:36,889 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1437726395811_0010_000001 State change from FINAL_SAVING to FAILED 2015-07-25 14:11:36,890 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1437726395811_0010 State change from RUNNING to ACCEPTED 2nd attempt starts 2015-07-25 14:11:36,890 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Added Application Attempt appattempt_1437726395811_0010_000002 to scheduler from user: root 2015-07-25 14:11:36,891 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1437726395811_0010_000002 State change from SUBMITTED to SCHEDULED Not sure what this Null container indicates: 2015-07-25 14:11:36,910 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
... View more
07-28-2015
12:52 AM
1 Kudo
I notice in RM logs that sometime my application transitions back from RUNNING to ACCEPTED state. Under what conditions would this happen? I thought this usually happens whenever RM or AM dies and recovers the applications. Such apps would transition from RUNNING --> ACCEPTED. Is that correct? However, in my case both RM and NM recovery is disabled: yarn.resourcemanager.recovery.enabled = false yarn.nodemanager.recovery.enabled = false Thanks, Sumit
... View more
Labels:
- Labels:
-
Apache YARN
- « Previous
-
- 1
- 2
- Next »