I am using Eclipse to build spark applications and every time I need to export the jar and run it from the shell to test the application. I am using a VM running CDH5.5.2 quick start vm in it. I have my eclipse installed in my windows (Host) and I create spark applications which is then exported as Jar file from Eclipse and copied over to Linux(Guest) and then, I run the spark application using spark-submit. This is very annoying sometimes because if you miss something in your program and the build was successful, the application will fail to execute and I need to fix the code and again export the Jar to run and so on. I am wondering if there is a much simpler way to run the job right from eclipse(Please note that I don't want to run spark in local mode) where the input file will be in HDFS? Is this a better way of doing? What are the Industry standards that are followed to develop. test and deploying spark applications in Production?
Currently Spark does not support the deployment to YARN from a SparkContext. Use spark-submit instead. For unit testing it is recommended to use [local] runner.
The problem is that you can not set the Hadoop conf from outside the SparkContext, it is received from *-site.xml config under HADOOP_HOME during the spark-submit. So you can not point to your remote cluster in Eclipse unless you setup the correct *-site.conf on your laptop and use spark-submit.
SparkSubmit is available as a Java class, but I doubt that
you will achieve what your are looking for with it. But you would be able to
launch a spark job from Eclipse to a remote cluster, if this is sufficient for you. Have a look at the Oozie Spark launcher as an example.
SparkContext is dramatically changing in Spark 2 in favor I think of SparkClient to support multiple SparkContexts. I am not sure what the situation is with that.