Has anyone ever used any python IDEs on Spark cluster ? Is there any way someone can install some python IDEs like Eclipse, Spyder etc on local windows machine and to submit spark jobs on a remote cluster via pyspark ? I could see that Spyder is available with Anaconda, but hadoop nodes where Anaconda is installed don't have GUI tools and it's not possible to see Spider UI that is initialized on remote linux edge node.
I dont know if you have a specific need to use IDE only, but have you given a try to Zeppelin ? zeppelin-server is running on your cluster itself and you can access it via browser. You can submit your spark jobs through either %livy or %spark interpreters. %livy also provides additional features such as session timeout, impersonation etc.
thanks for the reply. Forgot to mention, I am using Zeppelin and Jupyter right now. But, an IDE is more featureful and best suited in scenarios like module building. I have seen people using Spyder, pyCharm, Eclipse etc locally, but was looking to see if they could be integrated with remote multi-node Hadoop cluster.
I see. One of the ways you may try is to submit spark jobs remotely via Livy. It does not require you to have spark-client on your local machine. You need to have livy server installed and configured properly on your cluster. And then you can submit your jobs via REST API.
it should work with spark 1.6 as well. You can check master url in spark-defaults.config file in your cluster. If you setup SPARK_CONF_DIR variable and copy spark-defaults config from your cluster to it there is no need to specify master explicitly.