03-09-2017 06:59 AM
I am new to all this and was wanting to learn more about Hadoop and Spark, along with the other supporting technologies. So after a bit of research, using the quickstart Docker image seemed like a great option. The instructions for getting that up and running is pretty straight forward but how does one actually write and compile code to be used in the image?
I would like to use Intellij as my IDE and I have that on my host machine but my host has no idea about hadoop or any other framework or jar needed to write and compile code. And the docker image is a command line thing, so not sure how to work within that using an IDE like Intellij.
What is the work flow for writing code in an IDE that can be run in the quickstart image? I was really hoping for a simple 'Hello World' type example of how one can develop something to be run.
Thanks for the help!
03-09-2017 10:37 AM
you can refer the below link to start from scratch. You may need to little tweak to suit for docker but concept is same. Also this link uses Eclipse, but you can use Intellij
03-10-2017 09:59 AM
Thanks for the reply and the link but I guess I wasn't clear on my question,
I am not sure how one actually writes the code for a Hadoop job on the host machine when all the jars and libraries needed are on the container. Do people write the code in the container and if so what do they use?
03-10-2017 10:24 AM
you can use Eclipse, intellij, STS with maven/SBT plugin to write code & prepare Jar. Move the code to hadoop cluster for execution.
You can do this for both MapReduce & Spark. In addition to that, you can login to Spark directly using spark-shell (or) pyspark command & write code directly in either scala or python
The link that i've shared you will explain the above steps in detail