Support Questions

Find answers, ask questions, and share your expertise

PySpark Connection remote server

avatar
New Contributor

I've install a cluster with one node on a amazon machine thanks to ambari. I'm trying to use spark from an other machine thanks to pySpark.

This is my code :

from pyspark import SparkConf, SparkContext
conf = SparkConf().setAppName('hello').setMaster('spark://MYIP:7077')
sc = SparkContext(conf=conf)

The problem is that I have a connection refused when I run the program :

WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master "MYIP"

So, I tried this command to start the master : ./sbin/start-master.sh

And now, I have this error :

17/07/27 12:07:15 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master XX.XXX.XXX.XX:7077 org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100) at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108) at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1$$anon$1.run(StandaloneAppClient.scala:106) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: java.io.StreamCorruptedException: invalid stream header: 01000C31

This is not a problem of port because the port 7077 is open.

I don't find any answer for that problem on the forum, do you have any idea ?

1 ACCEPTED SOLUTION

avatar
New Contributor

That was a problem of version compatibility between spark in Ambari and my spark version imported with python.

View solution in original post

1 REPLY 1

avatar
New Contributor

That was a problem of version compatibility between spark in Ambari and my spark version imported with python.