Reply
Explorer
Posts: 38
Registered: ‎09-29-2016
Accepted Solution

Debug Spark program in Eclipse Data in AWS

hi All

       i am trying to run debug spark program in eclipse on Cloudera cluster on AWS EC2. i tried 

val conf = new SparkConf().setAppName("WordCount").setMaster("yarn-client")

val conf = new SparkConf().setAppName("WordCount").setMaster("local[3]")

       i find out i am facing an issue . the namenode in the AWS EC2 cluster return me the private IP in AWS.like 

172.31.26.79,172.31.26.80 etc.. which my local windows mechine not able to resolve . 

      Any idea how to handle all this ?

 

Explorer
Posts: 38
Registered: ‎09-29-2016

Re: Debug Spark program in Eclipse Data in AWS

I know move the program inside the cluster network will help ~ but sometime you can't move the program into the cluster 

Highlighted
Cloudera Employee
Posts: 97
Registered: ‎05-10-2016

Re: Debug Spark program in Eclipse Data in AWS

It's also possible to establish an ssl tunnel in order to connect to a remote debug session.  Take a look at the -L option for ssh, you will be able to open a local port and setup the remote port within the ssh command.  This will work for private IPs as long as you can connect to a server from a public IP that has access to the private network.  Note though that there can be extreme latency and still be difficult to debug in setups like this.

Explorer
Posts: 38
Registered: ‎09-29-2016

Re: Debug Spark program in Eclipse Data in AWS

Thank you for your reply . 

I solve the issue by create another node in the AWS EC2 as workspace through that node to connect to other AWS EC2 cluster . 

Cloudera Employee
Posts: 20
Registered: ‎01-17-2017

Re: Debug Spark program in Eclipse Data in AWS

Amazon Elastic MapReduce (EMR) builds proprietary versions of Apache Hadoop, Hive, and Pig optimized for running on Amazon Web Services. Amazon EMR provides a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (EC2) or Simple Storage Service (S3)