Support Questions

IgorYakushin · ‎03-02-2017

If one run pyspark without arguments on the gateway node on which CM is installed, one gets:

=========================

$ pyspark
Python 2.7.5 (default, Nov 20 2015, 02:00:19)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Error: Cluster deploy mode is not applicable to Spark shells.
Run with --help for usage help or --verbose for debug output
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/shell.py", line 43, in <module>
   sc = SparkContext(pyFiles=add_files)
File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/context.py", line 112, in __init__
   SparkContext._ensure_initialized(self, gateway=gateway)
File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/context.py", line 245, in _ensure_initialized
   SparkContext._gateway = gateway or launch_gateway()
File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway
   raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
=========================

However, it works on the cluster:

=========================

[ivy2@md01 ~]$ pyspark --master=yarn
Python 2.7.5 (default, Nov 20 2015, 02:00:19)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/03/02 14:54:04 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Welcome to
     ____              __
    / __/__ ___ _____/ /__
   _\ \/ _ \/ _ `/ __/ '_/
  /__ / .__/\_,_/_/ /_/\_\   version 1.6.0
     /_/

Using Python version 2.7.5 (default, Nov 20 2015 02:00:19)
SparkContext available as sc, HiveContext available as sqlContext.
>>>

=========================

Any ideas why?

Is this warning OK: "WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. "?

IgorYakushin · ‎03-02-2017

Forgot to say that

spark-submit --master=yarn --num-executors=3 lettercount.py

works fine.

IgorYakushin · ‎03-02-2017

Actually, I might have misinterpreted it: it works locally but not on the cluster

========

[ivy2@md01 lab7_Spark]$ pyspark --master=yarn --deploy-mode=cluster
Python 2.7.5 (default, Nov 20 2015, 02:00:19)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Error: Cluster deploy mode is not applicable to Spark shells.
Run with --help for usage help or --verbose for debug output
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/shell.py", line 43, in <module>
sc = SparkContext(pyFiles=add_files)
File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/context.py", line 112, in __init__
SparkContext._ensure_initialized(self, gateway=gateway)
File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/context.py", line 245, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway
raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
>>>

=======

[ivy2@md01 lab7_Spark]$ pyspark --master=yarn --deploy-mode=client
Python 2.7.5 (default, Nov 20 2015, 02:00:19)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/03/02 16:04:32 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.6.0
/_/

Using Python version 2.7.5 (default, Nov 20 2015 02:00:19)
SparkContext available as sc, HiveContext available as sqlContext.
======

[ivy2@md01 lab7_Spark]$ pyspark --master=local
Python 2.7.5 (default, Nov 20 2015, 02:00:19)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/03/02 16:06:40 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.6.0
/_/

Using Python version 2.7.5 (default, Nov 20 2015 02:00:19)
SparkContext available as sc, HiveContext available as sqlContext.

======

I think it worked before I configured the cluster to use TLS.

IgorYakushin · ‎03-02-2017

Same applies for spark-submit. It works locally but not on the cluster:

spark-submit --master=yarn --deploy-mode=client lettercount.py

works but

spark-submit --master=yarn --deploy-mode=cluster lettercount.py

crashes:

========

...

        client token: N/A
        diagnostics: Application application_1488351192359_0019 failed 2 times due to AM Container for appattempt_1488351192359_0019_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://md02.rcc.local:8088/proxy/application_1488351192359_0019/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1488351192359_0019_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
       at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
       at org.apache.hadoop.util.Shell.run(Shell.java:504)
       at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
       at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
       at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
       at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       at java.lang.Thread.run(Thread.java:745)
...

===========

AKR · ‎06-10-2019

Hi,

Could you please share the Entire console logs for further analysis?

Thanks

Arun

Daniel51 · ‎06-15-2019

@diebestetest wrote:
Hi,

Could you please share the Entire console logs for further analysis?

Thanks
Arun

Sorry not familiar with the topic.

Cloudera Community

Support Questions

pyspark crashes when running locally but works on a cluster