Member since
07-07-2018
3
Posts
0
Kudos Received
0
Solutions
07-14-2018
08:44 AM
@Felix Albani thanks for the answer. my datanode is up and running. and it works fine when i run any command from my cluster. However , problem occurred when i try to use eclipse from my local windows machine and connect it using spark. my cluster is in azure with 3 vms. below my understanding: 1. All the vm have public and internal ip. 2. namenode is successfully connected with public ip. 3. however datanode is not able to connect using public ip the reason when namenode provide list of ip to write data it provides internal ip address. below is exception : Excluding datanode DatanodeInfoWithStorage[10.0.0.8:50010,DS-fa8a8432-25c6-47af-9ffb-cd8aba0ccc77,DISK] the ip address 10.0.0.8 is the internal ip address of vm.
... View more
07-08-2018
06:41 AM
Hello @Felix Albani Thanks for your response. i have download all the files from the my vm's and added to the resources folder in my project. however i still get the error "org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hdfs/.sparkStaging/application_1531024403580_0103/__spark_conf__.zip could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and 2 node(s) are excluded in this operation." System.setProperty("SPARK_YARN_MODE", "true") System.setProperty("HADOOP_USER_NAME", "hdfs") //using cloudera user val conf = new SparkConf()
.setAppName("SparkApp")
. setMaster("yarn-client")
.set("spark.yarn.jars", "hdfs://104.215.158.249:8020/usr/hdp/2.6.5.0-292/spark2/jars/*.jar")
///user/hdfs/file.txt val sc = new SparkContext(conf)
val file = sc.textFile("/user/hdfs/file.txt") val words = file.flatMap { line => line.split(" ") } val wordsmap = words.map { word => (word,1) } val wordcount = wordsmap.reduceByKey((x,y)=> x+y) wordcount.collect.foreach(println)
sc.stop() this is the modified code. any help.
... View more
07-07-2018
08:38 AM
Hello, I have setup my hdp cluster in azure vms which has 3 node. one master and 2 slaves. i'm connect my master using putty client and open spark-shell. run spark jobs everything works fine. now i'have setup eclipse scala ide to develop spark application. but i would like to directly connect from my scala ide to the cluster hdfs where my data is stored and would like to run the program. i have setup below configuration : val conf = new SparkConf()
.setAppName("SparkApp")
.setMaster("yarn-client")
.set("spark.hadoop.fs.defaultFS", "hdfs://13.76.44.223")
.set("spark.hadoop.dfs.nameservices", "13.76.44.223:8020")
.set("spark.hadoop.yarn.resourcemanager.hostname", "13.76.44.223")
.set("spark.hadoop.yarn.resourcemanager.address", "13.76.44.223:8050").set("spark.driver.host","127.0.0.1") //this my local ip
.set("spark.local.ip", "13.76.44.223") //cdh vmnat ip
.set("spark.yarn.jar", "hdfs://13.76.44.223:8020/usr/hdp/2.6.5.0-292/spark2/jars/*.jar")
.set("mapreduce.app-submission.cross-platform", "true") val sc = new SparkContext(conf)
val file = sc.textFile("/user/hdfs/file.txt")
val words = file.flatMap { line => line.split(" ") } val wordsmap = words.map { word => (word,1) } val wordcount = wordsmap.reduceByKey((x,y)=> x+y) wordcount.collect.foreach(println) if i run above program from eclipse i'm getting 1. "Error initializing SparkContext.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hdfs/.sparkStaging/application_1530938503330_0168/__spark_conf__.zip could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and 2 node(s) are excluded in this operation. " 2. "org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hdfs/.sparkStaging/application_1530938503330_0168/__spark_conf__.zip could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and 2 node(s) are excluded in this operation." note : i have open all the necessary port which is used in above config. please anyone can help me do i missing anything.
... View more
Labels:
- Labels:
-
Apache Spark