Member since
10-16-2017
15
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9080 | 10-16-2017 03:17 PM |
04-12-2018
09:28 AM
Turns out that some of our node managers had some inconsistency with their java version for some reason and this caused issues with memory allocation during the executor creation! Disabled those node managers for now and the issue disappears. It was quite difficult to trace this issue without a proper stacktrace from YARN that lead to this issue.
... View more
04-06-2018
01:23 PM
Enough resources were available on the cluster, so dont think that was an issue. And I have dynamic resource allocation configured so it should just not allocate more resources than what is available and scale up when it needs more. I havent really changed the memory or CPUs requested per executor so that shouldnt be a problem either.
... View more
04-06-2018
11:02 AM
Hey guys, I am facing a peculiar situation that I need some help with. I have a job that succeeds just fine most times but recently has been giving me troubles. I am running spark 2.1 on YARN from a jupyter notebook on HDP 2.6.2 I can see that the spark session shuts down for some unknown reason and the job fails. When I dig through the errors I can see that from the YARN logs, the spark context has been shutdown because of executor failures with the following error message. 18/04/04 11:20:41 INFO ApplicationMaster: Final app status: FAILED, exitCode: 11, (reason: Max number of executor failures (10) reached) I was curious about why this started happening but I also noticed that recently I was a lot of error messages about initialization of containers failures as seen below. WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_e358_1521934670303_16356_01_000071 on host: x. Exit status: -1000. Diagnostics: Application application_1521934670303_16356 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is x
main : requested yarn user is x And as expected, I increased the threshold for executor failure from the default to 100 and set the timeout interval to 15 minutes because this was a long running notebook and it worked. But what I would like to understand is what is causing these containers to fail? I couldnt find anything interesting on the yarn logs or the driver logs. The container logs dont exist at all because the initialization failed directly. I am not sure if this could be because of preemption on the YARN queues. Any help on understanding/debugging these logs would be incredibly useful.
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
12-04-2017
11:16 AM
Maybe just one follow up question. What exactly would be the usecase when you would want to use the spark thrift server then? Is it simply to have a spark backend to hive instead of lets say tez or mr?
... View more
12-04-2017
11:11 AM
Appreciate the trouble to look into this @Dongjoon Hyun!
... View more
12-01-2017
12:23 PM
So my workflow for everyday purposes is, 1. I have data on hive that is refreshed by oozie periodically 2. I access this data from spark-shell and then build data sets/run ml algorithms and create new dataframes. 3. Then I am expected to persist these dataframes back to hive. 4. Then I can connect from Tableau through a JDBC connection and visualize the results. I wish to skip the step between 3 and 4 and be able to connect directly from Tableau to my spark-shell where my dataframe is and then visualize these results from there. Is this possible with the spark-thrift server? I see a lot of restrictions on how this can be run and havent managed to get it running even once before. Note: I am not admin on the HDP cluster so I dont have access to the keytabs etc that are needed for running hive server. I only wish to run as myself but trigger the creation of the dataset from tableau instead of going to hive but running a job that updates the hive table.
... View more
Labels:
- Labels:
-
Apache Spark
10-17-2017
08:47 AM
That seems more reasonable. But if you want to reduce the the port.maxRetries to 250, then better have a spacing of 250. And I think there was a typo, 40000-40031 is 32 ports so you can change it to 40032 if you are using maxRetries to 32 ports. And again, the executor ports will depend on what mode you are running spark on(standalone vs cluster vs client).
... View more
10-16-2017
03:17 PM
I think you have done the right thing here Charles. I had a setup where I set these ports up to specific values as well and then configure the max retries for these ports to control how many different applications can run in parallel. This configuration is in place to ensure that you dont have too many ports open like the setup you have I guess. So for example, if my spark.driver.port is set to some random value 40000 and then I specify spark.port.maxRetries as 32, then it will retry with 32 ports starting from 40000 and if a free port is available, it will connect to it. If not, the application will fail to start a spark context. But you should probably space each of the port ranges apart by 32 in this case so that you can have a range of 32 for each port. So in my configuration, it would be spark.driver.port: 40000, spark.blockManager.port:40033 etc. spark.executor.port shouldnt be relevant here for the firewall on the driver if you are running it with a cluster manager like yarn.Unless you are running spark standalone ofcourse then it could matter. You can see the documentation for this here: https://spark.apache.org/docs/latest/configuration.html Cheers
... View more