Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

first steps with Spark and Hue

SOLVED Go to solution

first steps with Spark and Hue

Explorer

Hello! I am tying to run a Spark script in hue:

 

Name of my script: SparkTest.py

 

Content of my script:

text_file = sc.textFile("hdfs://...testFile.txt")

counts = text_file.flatMap(lambda line: line.split(" ")) \

  .map(lambda word: (word, 1)) \

  .reduceByKey(lambda a, b: a + b)

  counts.saveAsTextFile("hdfs://...RESULT.txt")

 

Content of my testFile:

Test Test

 

Problem:

After running of this script my RESULT.txt file is still empty.

 

Question:

- Which Spark/Hue configuration do I need to run simple Spark scripts with the help of hue?

 

I use VM Cloudera 5.13

 

 

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions

Re: first steps with Spark and Hue

Explorer

Thank you!

 

I have change it.

 

Also I have changed my paths. Because the path is for directory and not for a file. I have also added a / to my path. Now I get results which I have expected. I changed "setMaster to "local" because it is just a small Cloudera VM without cluster.

 

This is a simple Spark script which can be executed in hue per Spark editor:

 

from pyspark import SparkContext, SparkConf
appNameTEST ="my first working application"

conf = SparkConf().setAppName(appNameTEST).setMaster("local")
sc = SparkContext(conf=conf)

text_file = sc.textFile("hdfs:///user/hive/warehouse/TEST/FilePath")
counts = text_file.flatMap(lambda line: line.split(" ")) \
             .map(lambda word: (word, 1)) \
             .reduceByKey(lambda a, b: a + b)
counts.saveAsTextFile("hdfs:///user/hive/warehouse/TEST/RESULT")

6 REPLIES 6

Re: first steps with Spark and Hue

Master Collaborator
It is hard to tell from this what can be the problem. Can you post the spark logs, do you have access to the Spark job UI? Do you get some error messages?

Re: first steps with Spark and Hue

Explorer

I have added SparkContext to my script:

from pyspark import SparkContext, SparkConf

conf = SparkConf().setAppName(appNameTEST).setMaster(master)
sc = SparkContext(conf=conf)

 

 

Most relevant error log in hue:

Traceback (most recent call last):
  File "/yarn/nm/usercache/cloudera/appcache/application_1542723589859_0008/container_1542723589859_0008_01_000002/SparkTest.py", line 3, in <module>
    conf = SparkConf().setAppName(appNameTEST).setMaster(master)
NameError: name 'appNameTEST' is not defined

 

 

Less relevant error:

2018-11-20 08:07:59,555 [DataStreamer for file /user/cloudera/oozie-oozi/0000003-181120074347071-oozie-oozi-W/spark2-b3ea--spark/action-data.seq] WARN  org.apache.hadoop.hdfs.DFSClient  - Caught exception
java.lang.InterruptedException
    at java.lang.Object.wait(Native Method)
    at java.lang.Thread.join(Thread.java:1281)
    at java.lang.Thread.join(Thread.java:1355)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)

 

 

Re: first steps with Spark and Hue

Master Collaborator
NameError: name 'appNameTEST' is not defined -> that is a syntax error, python does not know any variable with this name

Re: first steps with Spark and Hue

Explorer

Thank you, you are right. How can set a variable in python for Spark?

 

I thought that "conf = SparkConf().setAppName(appNameTEST).setMaster(master)" would set this variable?

 

 

Highlighted

Re: first steps with Spark and Hue

Master Collaborator
It is a normal python :-)
Just use myvariable = "value", so
app_name = "My gorgeous application"
conf = SparkConf().setAppName(app_name).setMaster(master)

Re: first steps with Spark and Hue

Explorer

Thank you!

 

I have change it.

 

Also I have changed my paths. Because the path is for directory and not for a file. I have also added a / to my path. Now I get results which I have expected. I changed "setMaster to "local" because it is just a small Cloudera VM without cluster.

 

This is a simple Spark script which can be executed in hue per Spark editor:

 

from pyspark import SparkContext, SparkConf
appNameTEST ="my first working application"

conf = SparkConf().setAppName(appNameTEST).setMaster("local")
sc = SparkContext(conf=conf)

text_file = sc.textFile("hdfs:///user/hive/warehouse/TEST/FilePath")
counts = text_file.flatMap(lambda line: line.split(" ")) \
             .map(lambda word: (word, 1)) \
             .reduceByKey(lambda a, b: a + b)
counts.saveAsTextFile("hdfs:///user/hive/warehouse/TEST/RESULT")