Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

pyspark crashes when running locally but works on a cluster

If one run pyspark without arguments on the gateway node on which CM is installed, one gets:

=========================

$ pyspark
Python 2.7.5 (default, Nov 20 2015, 02:00:19)  
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Error: Cluster deploy mode is not applicable to Spark shells.
Run with --help for usage help or --verbose for debug output
Traceback (most recent call last):
 File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/shell.py", line 43, in <module>
   sc = SparkContext(pyFiles=add_files)
 File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/context.py", line 112, in __init__
   SparkContext._ensure_initialized(self, gateway=gateway)
 File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/context.py", line 245, in _ensure_initialized
   SparkContext._gateway = gateway or launch_gateway()
 File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway
   raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
=========================

 

However, it works on the cluster:

 

=========================

[ivy2@md01 ~]$ pyspark --master=yarn
Python 2.7.5 (default, Nov 20 2015, 02:00:19)  
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/03/02 14:54:04 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Welcome to
     ____              __
    / __/__  ___ _____/ /__
   _\ \/ _ \/ _ `/ __/  '_/
  /__ / .__/\_,_/_/ /_/\_\   version 1.6.0
     /_/

Using Python version 2.7.5 (default, Nov 20 2015 02:00:19)
SparkContext available as sc, HiveContext available as sqlContext.
>>>

=========================

 

Any ideas why?

 

Is this warning OK: "WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. "?

 

 

 

5 REPLIES 5

Forgot to say that

spark-submit --master=yarn --num-executors=3 lettercount.py

works fine.

 

Actually, I might have misinterpreted it: it works locally but not on the cluster

========

[ivy2@md01 lab7_Spark]$ pyspark --master=yarn --deploy-mode=cluster
Python 2.7.5 (default, Nov 20 2015, 02:00:19)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Error: Cluster deploy mode is not applicable to Spark shells.
Run with --help for usage help or --verbose for debug output
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/shell.py", line 43, in <module>
sc = SparkContext(pyFiles=add_files)
File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/context.py", line 112, in __init__
SparkContext._ensure_initialized(self, gateway=gateway)
File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/context.py", line 245, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway
raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
>>>

=======

[ivy2@md01 lab7_Spark]$ pyspark --master=yarn --deploy-mode=client
Python 2.7.5 (default, Nov 20 2015, 02:00:19)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/03/02 16:04:32 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.6.0
/_/

Using Python version 2.7.5 (default, Nov 20 2015 02:00:19)
SparkContext available as sc, HiveContext available as sqlContext.
======

[ivy2@md01 lab7_Spark]$ pyspark --master=local
Python 2.7.5 (default, Nov 20 2015, 02:00:19)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/03/02 16:06:40 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.6.0
/_/

Using Python version 2.7.5 (default, Nov 20 2015 02:00:19)
SparkContext available as sc, HiveContext available as sqlContext.

======

 

I think it worked before I configured the cluster to use TLS.

Same applies for spark-submit. It works locally but not on the cluster:

 

spark-submit --master=yarn --deploy-mode=client lettercount.py

works but 

spark-submit --master=yarn --deploy-mode=cluster lettercount.py

crashes:

========

...

        client token: N/A
        diagnostics: Application application_1488351192359_0019 failed 2 times due to AM Container for appattempt_1488351192359_0019_000002 exited with  exitCode: 1
For more detailed output, check application tracking page:http://md02.rcc.local:8088/proxy/application_1488351192359_0019/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1488351192359_0019_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:  
       at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
       at org.apache.hadoop.util.Shell.run(Shell.java:504)
       at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
       at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
       at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
       at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       at java.lang.Thread.run(Thread.java:745)
...

===========

Cloudera Employee

Hi,

 

Could you please share the Entire console logs for further analysis?

 

 

Thanks

Arun

New Contributor

@diebestetest wrote:

Hi,

 

Could you please share the Entire console logs for further analysis?

 

 

Thanks

Arun


Sorry not familiar with the topic.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.