- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How do I identify Spark 2.3.1 installed on HDP 3.0 is working properly?
Created 08-10-2018 06:48 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have installed HDP 3.0 cluster on 5 nodes. and installed Spark 2.3.1 using Ambari service on one of the node. Spark installed node is: ser5.dev.local
I am trying to access this spark from other system which is not part of the cluster say cpu686.dev.local using pyspark in jupyter notebook. please find below code for reference:
import pyspak from pyspark import SQLContext conf = pyspark.SparkConf().setMaster("spark://ser5.dev.local:7077").setAppName("SparkServer1").setAll([('spark.executor.memory', '16g'), ('spark.executor.cores', '8'), ('spark.cores.max', '8'), ('spark.driver.memory','16g')]) sc = pyspark.SparkContext(conf=conf) rddFile = sc.textFile("Filterd_data.csv") rddFile = rddFile.mapPartitions(lambda x: csv.reader(x)) rddFile.collect()
Now, all connection is proper. spark context is created using the spark://ser5.dev.local:7077 url. RDD rddFile is also ran successfully. but when I ran rddFile.collect() then it keeps running. no output no error. Even we tried to upload csv file with less than 10 records. still it kept on running the code.
Is there any way that i can configure Spark, or where i can get master url to check running application in spark. when i click on spark UI in ambari it opens spark-history-server.
We tried csv file upload from HDFS using following code
conf = pyspark.SparkConf().setMaster("spark://ser5.dev.local:7077").setAppName("SparkServer1").setAll([('spark.executor.memory', '16g'), ('spark.executor.cores', '8'), ('spark.cores.max', '8'), ('spark.driver.memory','16g')]) sc = pyspark.SparkContext(conf=conf) sqlC = SQLContext(sc) df = sqlC.read.csv("hdfs://ser2.dev.local:8020/UnusualTime/Filterd_data.csv")
Still issue remains same.
Note: I installed spark using following documentation:
Created 08-10-2018 12:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How did you installed the spark client that is not part of the cluster? There are few considerations if the node is not managed by ambari such as:
1. The spark client version should be same as the one in the cluster
2. You need to make sure all the configuration files for hdfs/yarn/hive are copied from the cluster
3. When you launch a client in spark master mode this does not run in the cluster. This is running in standalone mode. To test cluster you need to use --master yarn (which can be used with client or cluster deployment modes)
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 08-10-2018 12:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How did you installed the spark client that is not part of the cluster? There are few considerations if the node is not managed by ambari such as:
1. The spark client version should be same as the one in the cluster
2. You need to make sure all the configuration files for hdfs/yarn/hive are copied from the cluster
3. When you launch a client in spark master mode this does not run in the cluster. This is running in standalone mode. To test cluster you need to use --master yarn (which can be used with client or cluster deployment modes)
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 08-13-2018 06:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Girish Khole, did the above helped?
Created 08-16-2018 05:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much @Felix Albani, I copied yarn-site.xml, core-site.xml, hdfs-site.xml to standalone spark instance. and started spark on HDP, and connection established successfully. issue got resolved. Thanks..