Member since
09-07-2018
7
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
11980 | 11-21-2018 01:54 AM |
11-21-2018
01:54 AM
Thank you! I have change it. Also I have changed my paths. Because the path is for directory and not for a file. I have also added a / to my path. Now I get results which I have expected. I changed "setMaster to "local" because it is just a small Cloudera VM without cluster. This is a simple Spark script which can be executed in hue per Spark editor: from pyspark import SparkContext, SparkConf appNameTEST ="my first working application" conf = SparkConf().setAppName(appNameTEST).setMaster("local") sc = SparkContext(conf=conf) text_file = sc.textFile("hdfs:///user/hive/warehouse/TEST/FilePath") counts = text_file.flatMap(lambda line: line.split(" ")) \ .map(lambda word: (word, 1)) \ .reduceByKey(lambda a, b: a + b) counts.saveAsTextFile("hdfs:///user/hive/warehouse/TEST/RESULT")
... View more
11-20-2018
09:05 AM
Thank you, you are right. How can set a variable in python for Spark? I thought that "conf = SparkConf().setAppName(appNameTEST).setMaster(master)" would set this variable?
... View more
11-20-2018
08:26 AM
I have added SparkContext to my script: from pyspark import SparkContext, SparkConf conf = SparkConf().setAppName(appNameTEST).setMaster(master) sc = SparkContext(conf=conf) Most relevant error log in hue: Traceback (most recent call last): File "/yarn/nm/usercache/cloudera/appcache/application_1542723589859_0008/container_1542723589859_0008_01_000002/SparkTest.py", line 3, in <module> conf = SparkConf().setAppName(appNameTEST).setMaster(master) NameError: name 'appNameTEST' is not defined Less relevant error: 2018-11-20 08:07:59,555 [DataStreamer for file /user/cloudera/oozie-oozi/0000003-181120074347071-oozie-oozi-W/spark2-b3ea--spark/action-data.seq] WARN org.apache.hadoop.hdfs.DFSClient - Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1281) at java.lang.Thread.join(Thread.java:1355) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
... View more
11-20-2018
06:57 AM
Hello! I am tying to run a Spark script in hue: Name of my script: SparkTest.py Content of my script: text_file = sc.textFile("hdfs://...testFile.txt") counts = text_file.flatMap(lambda line: line.split(" ")) \ .map(lambda word: (word, 1)) \ .reduceByKey(lambda a, b: a + b) counts.saveAsTextFile("hdfs://...RESULT.txt") Content of my testFile: Test Test Problem: After running of this script my RESULT.txt file is still empty. Question: - Which Spark/Hue configuration do I need to run simple Spark scripts with the help of hue? I use VM Cloudera 5.13 Thank you!
... View more
Labels:
- Labels:
-
Apache Spark