About ssk26

ssk26 · ‎01-02-2020

Hi @EricL, Thanks for your inputs. The value of yarn.app.mapreduce.am.resource.mb is set to 1024 in "mapred-site.xml" file. I was not able to find the value of "yarn.app.mapreduce.am.resource.cpu-vcores" in any of the XML files (i.e core-site.xml, mapred-site.xml, yarn-site.xml, capacity-scheduler.xml etc..) <property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>1024</value> </property> Here is the progress that I have made so far: - After setting the Yarn fair scheduler, I did set the Spark program to use Fair scheduling pool also (from default Spark fair scheduler XML template) - The minimum and maximum allocation (in MB) for Fair Scheduler in Yarn is set to 1024 MB and 3072 MB respectively. - After running a single Spark job [with both Driver and Executor memory set to 512MB], I was able to verify that the job is running. But, it was consuming the entire 3 GB memory. - So, the next Spark job is not running at all, as it is waiting for the memory. - But, if I revert back the YARN scheduling to "Capacity Scheduler", then with the same memory settings, both the jobs are running fine without any issues. So, what additional memory related parameters need to be set in Fair Scheduling for the jobs to run properly? Please help me in fixing this issue.

ssk26 · ‎01-01-2020

Hi @Shelton , I went through your email again and tried out all the options that you have mentioned. But, I am still facing the same issue, while running the second job. Please let me know if anything else needs to be set, or it is a pure memory related issue and kindly suggest on fixing this issue. Here are the relevant screenshots: Screenshot 1: YARN Fair Scheduler XML file (I tried setting maxAMShare to 0.1 - but the first spark job didn't start at all - so I had to bump it to 0.5) Screenshot 2: Spark Fair Scheduler XML file (this is placed in $SPARK_HOME/conf directory, i.e /usr/hdp/3.1.0.0-78/spark2/conf) Screenshot 3: Spark Configuration Parameters set through the pyspark program Screenshot 4: YARN Cluster Information (Total number of VCORES is 6 and Total amount of memory present in 2 node cluster is 15.3 GB) Note: Since this is a flask application, it will launch 2 jobs I believe, one to open the port at 5000 and another to accept the inputs. The whole idea behind this exercise is to test how many number of spark sessions can run at parallel in a single Spark Context. Screenshot 5: This shows the percentage usage of queue as well as cluster by the first job. As we can see, there is sufficient space in both Cluster as well as Queue. But, for some reason, the second job never gets the required amount of resources. I know this could be because the fair-scheduler's maximum allocation is set to 3GB. Can you please let me know how to bump up this value. I am also curious here - even though the maxResources in fair-scheduler.xml file is set to 8 GB, the fair scheduler's maximum allocation is set to 3 GB only. Is it because of the value of maxAMShare? Also, I am supplying both driver and executor memory to 512 MB only. How is my job occupying 3 GB of space? Screenshot 6: This screenshot shows that the job 2 never gets the required amount of resources.

ssk26 · ‎01-01-2020

Hi @Shelton , Thanks for your inputs. If I understand correctly, there will be 2 fair-scheduler.xml files? One for YARN kept in $HADOOP_CONF_DIR and one more in $SPARK_HOME? For fair-scheduler.xml belonging to Spark, how to configure the parameter in Ambari? Also, the queueMaxAMShareDefault or maxAMShare value - earlier it was 0.5 only - but since it was not launching the jobs due to the AM resource exceeded error, I did set it to 0.8 - I will try setting it to 0.1 and will check it. Please let me know your inputs. Thanks and Regards, Sudhindra

ssk26 · ‎01-01-2020

Hi @Shelton , I have made some progress on this issue. I have modified the fair-scheduler.xml and have set both "maxAMShare" and "queueMaxAMShareDefault" to 0.8 and weight to default value (1.0). The result: One spark job is running fine. However, I am getting the same error as before on the exceeding of maximum AM resources limit, when I try to run the next job. The modified fair-scheduler.xml is given below. Please provide your inputs on how to fix this particular issue. Also, one interesting observation is that, even though the YARN Scheduling mode is showing as "Fair", the Spark Scheduling mode is still showing as "FIFO". Can I set it to "Fair" as well through the program? Since I am setting spark.master as "YARN", I believe the Fair scheduling mode will take precedence over the Spark scheduling mode. Please correct me if I am wrong.

ssk26 · ‎01-01-2020

Hi @Shelton, Please help me on this. I am again stuck on the issue. Did I wrongly configure anything? Also, even after setting preemption to false, in the YARN Resource Manager UI, I am able to see that the preemption is still enabled. Is this causing the problem? Thanks and Regards, Sudhindra

ssk26 · ‎12-30-2019

Hi, I have setup YARN Fair-scheduler in Ambari (HDP 3.1.0.0-78) for "Default" queue itself. So far, I haven't added any new queues. Now, I want to run a simple job against the queue and when I submit the job, the application state is in "ACCEPTED" state forever. I get the below message in YARN logs: The additional information is given below. Please help me in fixing this issue at the earliest. For "default" queue, the below parameters are set through "fair-scheduler.xml". Also, no jobs are currently running, apart from the one that I have launched. Given below is the screenshot, which confirms that the maximum AM resource percent is greater than 0.1

ssk26 · ‎12-30-2019

Hi @Shelton, Thanks for the reply. I have been able to change the scheduling mode to Fair-Scheduler, which is great. However, my application is not running, due to resource allocation issue. I am getting the below standard error. [Mon Dec 30 15:02:19 +0530 2019] Application is added to the scheduler and is not yet activated. (Resource request: <memory:1024, vCores:1> exceeds maximum AM resource allowed). I am attaching all the relevant screenshots as well as information of my Yarn cluster for your reference. Please guide me in fixing this issue. Why this issue usually occurs. My YARN cluster has 2 nodes, scheduling mode as "Fair-Scheduler", minimum allocation of 1 GB/1 vcores and maximum allocation of 15GB/3 vcores and overall memory is 30GB. Given below is "fair-scheduler.xml" contents: Below is the custom yarn-site parameters that have been set and the preemption is disabled as well.

ssk26 · ‎12-26-2019

Hi All, The issue has got fixed. It was due to Spark Executor JVM Option being set incorrectly. Thanks and Regards, Sudhindra

ssk26 · ‎12-26-2019

Hi @KuldeepK @chennuri_gauris @Shelton, This is just a gentle reminder. I really need your help in looking into the below issue. I want to understand what additional steps need to be done to setup a fair scheduler on YARN on HDP 3.1.0.0-78. Thanks and Regards, Sudhindra

ssk26 · ‎12-23-2019

Hi @senthh, Thanks a lot for your reply. I have monitored the Spark Streaming logs and verified that the connection to broker is established correctly. Given below are the logs confirming the same: The interesting thing is the same Spark Streaming job is working outside Kubernetes setup, without any issues. Please help!! 9/12/23 15:56:59.562 INFO AppInfoParser: Kafka version: 2.2.1 19/12/23 15:56:59.563 INFO AppInfoParser: Kafka commitId: 55783d3133a5a49a 19/12/23 15:56:59.566 DEBUG KafkaConsumer: [Consumer clientId=consumer-1, groupId=test] Kafka consumer initialized 19/12/23 15:56:59.569 INFO KafkaConsumer: [Consumer clientId=consumer-1, groupId=test] Subscribed to partition(s): table-update-0 19/12/23 15:56:59.593 DEBUG NetworkClient: [Consumer clientId=consumer-1, groupId=test] Initialize connection to node 10.20.0.44:29092 (id: -1 rack: null) for sending metadata request 19/12/23 15:56:59.596 DEBUG NetworkClient: [Consumer clientId=consumer-1, groupId=test] Initiating connection to node 10.20.0.44:29092 (id: -1 rack: null) using address /10.20.0.44 19/12/23 15:56:59.640 DEBUG Metrics: Added sensor with name node--1.bytes-sent 19/12/23 15:56:59.642 DEBUG Metrics: Added sensor with name node--1.bytes-received 19/12/23 15:56:59.642 DEBUG Metrics: Added sensor with name node--1.latency 19/12/23 15:56:59.643 DEBUG Selector: [Consumer clientId=consumer-1, groupId=test] Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 131072, SO_TIMEOUT = 0 to node -1 19/12/23 15:56:59.928 DEBUG NetworkClient: [Consumer clientId=consumer-1, groupId=test] Completed connection to node -1. Fetching API versions. 19/12/23 15:56:59.929 DEBUG NetworkClient: [Consumer clientId=consumer-1, groupId=test] Initiating API versions fetch from node -1. 19/12/23 15:56:59.962 DEBUG NetworkClient: [Consumer clientId=consumer-1, groupId=test] Recorded API versions for node -1: (Produce(0): 0 to 7 [usable: 7], Fetch(1): 0 to 10 [usable: 10], ListOffsets(2): 0 to 5 [usable: 5], Metadata(3): 0 to 7 [usable: 7], LeaderAndIsr(4): 0 to 2 [usable: 2], StopReplica(5): 0 to 1 [usable: 1], UpdateMetadata(6): 0 to 5 [usable: 5], ControlledShutdown(7): 0 to 2 [usable: 2], OffsetCommit(8): 0 to 6 [usable: 6], OffsetFetch(9): 0 to 5 [usable: 5], FindCoordinator(10): 0 to 2 [usable: 2], JoinGroup(11):

Online	Offline
Last Visited	‎10-28-2020 02:04 PM

Member Since	‎12-12-2019 12:16 AM
Last Visited	‎10-28-2020 02:04 PM
Posts	34
Kudos received	1

Cloudera Community

Re: Spark on Yarn - How to run multiple tasks in a...

Re: Unable to read Kafka topic messages

Re: Spark jobs are stuck under YARN Fair Scheduler

Re: Unable to start Node Manager

Re: Unable to start Node Manager

Re: Unable to start Node Manager

Re: Unable to start Node Manager

Spark jobs are stuck under YARN Fair Scheduler

Re: Unable to start Node Manager

Re: Unable to read Kafka topic messages

Re: Unable to start Node Manager

Re: Unable to read Kafka topic messages