About suram1

suram1 · ‎04-12-2018

Turns out that some of our node managers had some inconsistency with their java version for some reason and this caused issues with memory allocation during the executor creation! Disabled those node managers for now and the issue disappears. It was quite difficult to trace this issue without a proper stacktrace from YARN that lead to this issue.

suram1 · ‎04-06-2018

Enough resources were available on the cluster, so dont think that was an issue. And I have dynamic resource allocation configured so it should just not allocate more resources than what is available and scale up when it needs more. I havent really changed the memory or CPUs requested per executor so that shouldnt be a problem either.

suram1 · ‎04-06-2018

Hey guys, I am facing a peculiar situation that I need some help with. I have a job that succeeds just fine most times but recently has been giving me troubles. I am running spark 2.1 on YARN from a jupyter notebook on HDP 2.6.2 I can see that the spark session shuts down for some unknown reason and the job fails. When I dig through the errors I can see that from the YARN logs, the spark context has been shutdown because of executor failures with the following error message. 18/04/04 11:20:41 INFO ApplicationMaster: Final app status: FAILED, exitCode: 11, (reason: Max number of executor failures (10) reached) I was curious about why this started happening but I also noticed that recently I was a lot of error messages about initialization of containers failures as seen below. WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_e358_1521934670303_16356_01_000071 on host: x. Exit status: -1000. Diagnostics: Application application_1521934670303_16356 initialization failed (exitCode=255) with output: main : command provided 0 main : run as user is x main : requested yarn user is x And as expected, I increased the threshold for executor failure from the default to 100 and set the timeout interval to 15 minutes because this was a long running notebook and it worked. But what I would like to understand is what is causing these containers to fail? I couldnt find anything interesting on the yarn logs or the driver logs. The container logs dont exist at all because the initialization failed directly. I am not sure if this could be because of preemption on the YARN queues. Any help on understanding/debugging these logs would be incredibly useful.

suram1 · ‎12-04-2017

Maybe just one follow up question. What exactly would be the usecase when you would want to use the spark thrift server then? Is it simply to have a spark backend to hive instead of lets say tez or mr?

suram1 · ‎12-04-2017

Appreciate the trouble to look into this @Dongjoon Hyun!

suram1 · ‎12-01-2017

So my workflow for everyday purposes is, 1. I have data on hive that is refreshed by oozie periodically 2. I access this data from spark-shell and then build data sets/run ml algorithms and create new dataframes. 3. Then I am expected to persist these dataframes back to hive. 4. Then I can connect from Tableau through a JDBC connection and visualize the results. I wish to skip the step between 3 and 4 and be able to connect directly from Tableau to my spark-shell where my dataframe is and then visualize these results from there. Is this possible with the spark-thrift server? I see a lot of restrictions on how this can be run and havent managed to get it running even once before. Note: I am not admin on the HDP cluster so I dont have access to the keytabs etc that are needed for running hive server. I only wish to run as myself but trigger the creation of the dataset from tableau instead of going to hive but running a job that updates the hive table.

suram1 · ‎10-17-2017

That seems more reasonable. But if you want to reduce the the port.maxRetries to 250, then better have a spacing of 250. And I think there was a typo, 40000-40031 is 32 ports so you can change it to 40032 if you are using maxRetries to 32 ports. And again, the executor ports will depend on what mode you are running spark on(standalone vs cluster vs client).

suram1 · ‎10-16-2017

I think you have done the right thing here Charles. I had a setup where I set these ports up to specific values as well and then configure the max retries for these ports to control how many different applications can run in parallel. This configuration is in place to ensure that you dont have too many ports open like the setup you have I guess. So for example, if my spark.driver.port is set to some random value 40000 and then I specify spark.port.maxRetries as 32, then it will retry with 32 ports starting from 40000 and if a free port is available, it will connect to it. If not, the application will fail to start a spark context. But you should probably space each of the port ranges apart by 32 in this case so that you can have a range of 32 for each port. So in my configuration, it would be spark.driver.port: 40000, spark.blockManager.port:40033 etc. spark.executor.port shouldnt be relevant here for the firewall on the driver if you are running it with a cluster manager like yarn.Unless you are running spark standalone ofcourse then it could matter. You can see the documentation for this here: https://spark.apache.org/docs/latest/configuration.html Cheers

Online	Offline
Last Visited	‎10-22-2018 09:33 AM

Member Since	‎10-16-2017 01:42 PM
Last Visited	‎10-22-2018 09:33 AM
Posts	15

Cloudera Community

Re: Spark Jobs failing - firewall issue....?

Re: Spark Session shutting down automatically

Re: Spark Session shutting down automatically

Spark Session shutting down automatically

Re: Accessing spark dataframe in spark-shell throu...

Re: Accessing spark dataframe in spark-shell throu...

Accessing spark dataframe in spark-shell through J...

Re: Spark Jobs failing - firewall issue....?

Re: Spark Jobs failing - firewall issue....?