Support Questions

Find answers, ask questions, and share your expertise

Can Eclipse/IntelliJ Idea be used to execute code on the cluster

avatar
Super Collaborator

Production system : HDP-2.5.0.0 using Ambari 2.4.0.1

Aplenty demands coming in for executing a range of code(Java MR etc., Scala, Spark, R) atop the HDP but from a desktop Windows machine IDE i.e execute the code locally from IDE but it gets submitted, runs on the remote cluster and prints back the output.

For Spark and R, we have R-Studio set-up.

The challenge lies with Java, Scala and so on, also, people use a range of IDEs from Eclipse to IntelliJ Idea.

I am aware that the Eclipse Hadoop plugin is NOT actively maintained and also has aplenty bugs when working with latest versions of Hadoop, IntelliJ Idea I couldn't find reliable inputs from the official website.

I believe the Hive and HBase client API is a reliable way to connect from Eclipse etc. but I am skeptical about executing MR or other custom Java/Scala code.

I referred several threads like this and this, however, I still have the question that is any IDE like Eclipse/Intellij Idea having an official support for Hadoop ? Even the Spring Data for Hadoop seems to lost traction, it anyways didn't work as expected 2 years ago 😉

As a realistic alternative, which tool/plugin/library should be used to test the MR and other Java/Scala code 'locally' i.e on the desktop machine using a standalone version of the cluster ?

Note : I do not wish to work against/in the sandbox, its about connecting to the prod. cluster directly.

1 ACCEPTED SOLUTION

avatar

@Kaliyug Antagonist

"As a realistic alternative, which tool/plugin/library should be used to test the MR and other Java/Scala code 'locally' i.e on the desktop machine using a standalone version of the cluster?"

Please see the hadoop-mini-clusters github project. hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE, without the need for a sandbox or a full-blown development cluster. It allows the user to debug with the full power of the IDE.

View solution in original post

2 REPLIES 2

avatar

@Kaliyug Antagonist

"As a realistic alternative, which tool/plugin/library should be used to test the MR and other Java/Scala code 'locally' i.e on the desktop machine using a standalone version of the cluster?"

Please see the hadoop-mini-clusters github project. hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE, without the need for a sandbox or a full-blown development cluster. It allows the user to debug with the full power of the IDE.

avatar
Super Guru

@Kaliyug Antagonist

I like Tom's suggestion and will it try myself.

Otherwise, if you wish to create your local cluster with Vagrant: https://community.hortonworks.com/articles/39156/setup-hortonworks-data-platform-using-vagrant-virt....

I use Eclipse and Vagrant cluster. I share a folder between my local machine and the cluster where I place the output jars and then submit them for execution. I followed instructions published here: here: https://community.hortonworks.com/articles/43269/intellij-eclipse-usage-against-hdp-25-sandbox.html

I am not sure why you are against of the idea to use the sandbox. The code you develop can be at most tested functionally, locally. I get it that you want more debugging capabilities locally. A true load testing still needs to happen in a full scale environment.