Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Spark strange behavior: Im executing a query and its working, but the info in the spark page master:8080 dont update

Explorer

Im executing a query on spark and it is working Im getting the result. I did not configure any cluster so spark should be using its own cluster manager.

But in the spark page: master:8080 I get this:

Alive Workers: 2 Cores in use: 4 Total, 0 Used Memory in use: 6.0 GB Total, 0.0 B Used Applications: 0 Running, 0 Completed Drivers: 0 Running, 0 Completed Status: ALIVE

But when Im executing the query I get the same result while Im refresinh the page:

Alive Workers: 2
Cores in use: 4 Total, 0 Used
Memory in use: 6.0 GB Total, 0.0 B Used
Applications: 0 Running, 0 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE

And after the execution of the query this is the same again...Do you know why? Its very strange, it seems that spark is executing the query without using any hardware which is not possible, so why this info is not updating do you know?

1 ACCEPTED SOLUTION

@John Cod

How you are submit job? if you are not specifying --master "spark://masterip:7077" while running spark shell then it will run in local mode.

View solution in original post

12 REPLIES 12

@John Cod

How you are submit job? if you are not specifying --master "spark://masterip:7077" while running spark shell then it will run in local mode.

Explorer

Hi, Im executing the job on shell. To start shell I use the command "spark-shell". So I need to use spark-shell --master?

@John Cod

Yes, you need to specify the spark master URI.

spark-shell --master spark://masterhost:7077

Explorer

Thanks, but now Im getting this error when I try to execute a query: "16/05/25 12:15:15 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@5547fcb1)". And this Warn: 16/05/25 12:15:05 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resourcesDo you know why?

Explorer

This warn appears when the query starts to execute in stage 0 and then appears the error.

@John Cod

Can you please share the screenshot of http://master:8080 UI and command you ran along with full spark-shell logs?

Explorer

I decrease the memory in spark-env.sh and now it seems that its working, thanks!

Explorer

I just see your comment now, but I think its working fine now, it seems that I was setting more memory than the memory available.

Actually, if you don't specify local mode (--master "local") then you will be running in Standalone mode described here:

  • Standalone mode: By default, applications submitted to the standalone mode cluster will run in FIFO (first-in-first-out) order, and each application will try to use all available nodes. You can limit the number of nodes an application uses by setting the spark.cores.maxconfiguration property in it, or change the default for applications that don’t set this setting through spark.deploy.defaultCores. Finally, in addition to controlling cores, each application’s spark.executor.memory setting controls its memory use.

Also, I think you have the port wrong for the Monitor web interface, try using port 4040 instead of 8080, like this:

http://<driver-node>:4040

Explorer

Hi, thanks for your answer. But Im not understanding. I think the answer that I accpted fixed the issue. Because starting the spark-shell with spark-shell --master spark://masterhost:7077 in the 8080 port I get:

  • Cores in use: 4 Total, 4 Used
  • Memory in use: 4.0 GB Total, 2.0 GB Used
  • Applications: 1 Running, 0 Completed
  • Drivers: 0 Running, 0 Completed
  • Status: ALIVE

So it seems that it is already working starting spark-shell with thay way, right? But you are suggesting that should be spark-shell --master "local" spark:///mastehost:7077?

Super Guru

was there anything on the spark history server or in logs.

Spark supports the following cluster modes:

  1. Pseudo Cluster mode (everything runs on one node) - For debugging/developing Spark
  2. Standalone: Spark provides cluster manager facilities
  3. Spark on YARN : YARN provides Cluster manager facilities.
    1. yarn-client mode: Spark Driver runs outside YARN
    2. yarn-cluster mode: Drivers also runs in YARN
  4. Spark on Mesos : Mesos provides Cluster manager facilities

We don't supprot Spark on Mesos.

For Spark on YARN specify mode by adding --master yarn-client or --master yarn-cluster on your Spark-submit command on a per job basis. Or configure it in spark-defaults.conf for all jobs submitted from that node.

--master "spark://masterip:7077" indicates Spark standalone mode.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.