Support Questions

Stefan_S · ‎11-20-2018

Hello! I am tying to run a Spark script in hue:

Name of my script: SparkTest.py

Content of my script:

text_file = sc.textFile("hdfs://...testFile.txt")

counts = text_file.flatMap(lambda line: line.split(" ")) \

.map(lambda word: (word, 1)) \

.reduceByKey(lambda a, b: a + b)

counts.saveAsTextFile("hdfs://...RESULT.txt")

Content of my testFile:

Test Test

Problem:

After running of this script my RESULT.txt file is still empty.

Question:

- Which Spark/Hue configuration do I need to run simple Spark scripts with the help of hue?

I use VM Cloudera 5.13

Thank you!

Stefan_S · ‎11-21-2018

Thank you!

I have change it.

Also I have changed my paths. Because the path is for directory and not for a file. I have also added a / to my path. Now I get results which I have expected. I changed "setMaster to "local" because it is just a small Cloudera VM without cluster.

This is a simple Spark script which can be executed in hue per Spark editor:

from pyspark import SparkContext, SparkConf
appNameTEST ="my first working application"

conf = SparkConf().setAppName(appNameTEST).setMaster("local")
sc = SparkContext(conf=conf)

text_file = sc.textFile("hdfs:///user/hive/warehouse/TEST/FilePath")
counts = text_file.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a + b)
counts.saveAsTextFile("hdfs:///user/hive/warehouse/TEST/RESULT")

View solution in original post

Tomas79 · ‎11-20-2018

It is hard to tell from this what can be the problem. Can you post the spark logs, do you have access to the Spark job UI? Do you get some error messages?

Stefan_S · ‎11-20-2018

I have added SparkContext to my script:

from pyspark import SparkContext, SparkConf

conf = SparkConf().setAppName(appNameTEST).setMaster(master)
sc = SparkContext(conf=conf)

Most relevant error log in hue:

Traceback (most recent call last):
File "/yarn/nm/usercache/cloudera/appcache/application_1542723589859_0008/container_1542723589859_0008_01_000002/SparkTest.py", line 3, in <module>
conf = SparkConf().setAppName(appNameTEST).setMaster(master)
NameError: name 'appNameTEST' is not defined

Less relevant error:

2018-11-20 08:07:59,555 [DataStreamer for file /user/cloudera/oozie-oozi/0000003-181120074347071-oozie-oozi-W/spark2-b3ea--spark/action-data.seq] WARN org.apache.hadoop.hdfs.DFSClient - Caught exception
java.lang.InterruptedException
   at java.lang.Object.wait(Native Method)
   at java.lang.Thread.join(Thread.java:1281)
   at java.lang.Thread.join(Thread.java:1355)
   at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
   at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
   at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

Tomas79 · ‎11-20-2018

NameError: name 'appNameTEST' is not defined -> that is a syntax error, python does not know any variable with this name

Stefan_S · ‎11-20-2018

Thank you, you are right. How can set a variable in python for Spark?

I thought that "conf = SparkConf().setAppName(appNameTEST).setMaster(master)" would set this variable?

Tomas79 · ‎11-20-2018

It is a normal python 🙂
Just use myvariable = "value", so
app_name = "My gorgeous application"
conf = SparkConf().setAppName(app_name).setMaster(master)

Stefan_S · ‎11-21-2018

Thank you!

I have change it.

Also I have changed my paths. Because the path is for directory and not for a file. I have also added a / to my path. Now I get results which I have expected. I changed "setMaster to "local" because it is just a small Cloudera VM without cluster.

This is a simple Spark script which can be executed in hue per Spark editor:

from pyspark import SparkContext, SparkConf
appNameTEST ="my first working application"

conf = SparkConf().setAppName(appNameTEST).setMaster("local")
sc = SparkContext(conf=conf)

text_file = sc.textFile("hdfs:///user/hive/warehouse/TEST/FilePath")
counts = text_file.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a + b)
counts.saveAsTextFile("hdfs:///user/hive/warehouse/TEST/RESULT")

Cloudera Community

Support Questions

first steps with Spark and Hue