Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Debug Spark program in Eclipse Data in AWS

SOLVED Go to solution

Debug Spark program in Eclipse Data in AWS

Contributor

hi All

       i am trying to run debug spark program in eclipse on Cloudera cluster on AWS EC2. i tried 

val conf = new SparkConf().setAppName("WordCount").setMaster("yarn-client")

val conf = new SparkConf().setAppName("WordCount").setMaster("local[3]")

       i find out i am facing an issue . the namenode in the AWS EC2 cluster return me the private IP in AWS.like 

172.31.26.79,172.31.26.80 etc.. which my local windows mechine not able to resolve . 

      Any idea how to handle all this ?

 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Debug Spark program in Eclipse Data in AWS

Expert Contributor

It's also possible to establish an ssl tunnel in order to connect to a remote debug session.  Take a look at the -L option for ssh, you will be able to open a local port and setup the remote port within the ssh command.  This will work for private IPs as long as you can connect to a server from a public IP that has access to the private network.  Note though that there can be extreme latency and still be difficult to debug in setups like this.

4 REPLIES 4

Re: Debug Spark program in Eclipse Data in AWS

Contributor

I know move the program inside the cluster network will help ~ but sometime you can't move the program into the cluster 

Highlighted

Re: Debug Spark program in Eclipse Data in AWS

Expert Contributor

It's also possible to establish an ssl tunnel in order to connect to a remote debug session.  Take a look at the -L option for ssh, you will be able to open a local port and setup the remote port within the ssh command.  This will work for private IPs as long as you can connect to a server from a public IP that has access to the private network.  Note though that there can be extreme latency and still be difficult to debug in setups like this.

Re: Debug Spark program in Eclipse Data in AWS

Contributor

Thank you for your reply . 

I solve the issue by create another node in the AWS EC2 as workspace through that node to connect to other AWS EC2 cluster . 

Re: Debug Spark program in Eclipse Data in AWS

Cloudera Employee

Amazon Elastic MapReduce (EMR) builds proprietary versions of Apache Hadoop, Hive, and Pig optimized for running on Amazon Web Services. Amazon EMR provides a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (EC2) or Simple Storage Service (S3)