Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Failed to initialize pyspark2.2 in CDH5.12

avatar
New Contributor
ENV :Python3.6.1 ,JDK1.8,CDH5.12,Spark2.2. Following the official tutorial to setup with csd and parcels. Anything seen on the cloudera manager is ok! But I failed to initialize the pyspark2 in the shell. I found no methods to solve this problem . Anyone can help me? I feel so down for several days and boss want me fix the problem until today Very Thank You ! Here's the log : [hdfs@Master /data/soft/spark2.2]$ pyspark2 Python 3.6.1 (default, Jul 27 2017, 11:07:01) [GCC 4.4.6 20110731 (Red Hat 4.4.6-4)] on linux Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/07/27 12:02:09 ERROR spark.SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
4 REPLIES 4

avatar
New Contributor
17/07/27 12:02:09 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered! 17/07/27 12:02:09 ERROR util.Utils: Uncaught exception in thread Thread-2 java.lang.NullPointerException ... /opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/lib/spark2/python/pyspark/shell.py:52: UserWarning: Fall back to non-hive support because failing to access HiveConf, please make sure you build spark with hive warnings.warn("Fall back to non-hive support because failing to access HiveConf, " 17/07/27 12:02:09 WARN spark.SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at: ... 17/07/27 12:02:14 ERROR spark.SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. ... 17/07/27 12:02:14 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered! 17/07/27 12:02:14 ERROR util.Utils: Uncaught exception in thread Thread-2 java.lang.NullPointerException ... py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. ... During handling of the above exception, another exception occurred: ... py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.

avatar
Master Collaborator
Error initializing SparkContext. org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.

The error doesn't up front tells what is wrong but it generally suggests communication/network problems.Spark assigns a random port number to a group of configuration properties that are used for communication between the client and the cluster on a Spark on YARN installation. If any of those random port number assignments fall outside of the range of open ports at the time when the application is submitted, the shell fails. I'd ensure that there are no firewalls (software like iptables or hardware) blocking the port between the driver and the application master. 

 

Also please check the yarn application logs (from the Resource Manager Web UI) to see if there are any hints/errors logged.

 

Example 


Container: container_1487033446961_56149_01_000001 on xxx.com_8041
========================================================================================
LogType:stderr
Log Upload Time:Mon Apr 03 17:57:15 -0400 2017
LogLength:2235
Log Contents:
17/04/03 17:52:55 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
17/04/03 17:52:55 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1487033446961_56149_000001
17/04/03 17:52:56 INFO spark.SecurityManager: Changing view acls to: 25761081
17/04/03 17:52:56 INFO spark.SecurityManager: Changing modify acls to: 25761081
17/04/03 17:52:56 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(25761081); users with modify permissions: Set(25761081)
17/04/03 17:52:56 INFO yarn.ApplicationMaster: Waiting for Spark driver to be reachable.
17/04/03 17:55:03 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10.xx.xx.243:38634, retrying ...
17/04/03 17:55:03 ERROR yarn.ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Failed to connect to driver!   <<<<
at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:484)
at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:345)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:187)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:653)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:651)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:674)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
17/04/03 17:55:03 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
17/04/03 17:55:03 INFO util.ShutdownHookManager: Shutdown hook called

avatar
New Contributor
Thanks! Am I right to find log here? [root@Master /var/log/spark2]# cat spark2-history-server-Master.log when I try to run a simple terminal in pyspark2. Errors occured. Using Python version 3.6.1 (default, Jul 27 2017 11:07:01) SparkSession available as 'spark'. >>> strings=['old'] >>> s2=sc.parallelize(strings) >>> s3=s2.map(lambda x:x.upper()) >>> s3.collect() [Stage 0:> (0 + 0) / 2]17/07/27 14:52:18 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1501131033996_0005_01_000002 on host: Slave3. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1501131033996_0005_01_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 17/07/27 14:52:18 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1501131033996_0005_01_000003 on host: Slave3. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1501131033996_0005_01_000003 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 17/07/27 14:52:25 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1501131033996_0005_01_000005 on host: Slave3. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1501131033996_0005_01_000005 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 17/07/27 14:52:25 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1501131033996_0005_01_000004 on host: Slave3. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1501131033996_0005_01_000004 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 17/07/27 14:52:27 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/07/27 14:52:31 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1501131033996_0005_01_000007 on host: Slave3. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1501131033996_0005_01_000007 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 17/07/27 14:52:31 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1501131033996_0005_01_000006 on host: Slave3. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1501131033996_0005_01_000006 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 17/07/27 14:52:38 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1501131033996_0005_01_000009 on host: Slave3. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1501131033996_0005_01_000009 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 17/07/27 14:52:38 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1501131033996_0005_01_000008 on host: Slave3. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1501131033996_0005_01_000008 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 17/07/27 14:52:42 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/07/27 14:52:45 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1501131033996_0005_01_000011 on host: Slave3. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1501131033996_0005_01_000011 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 17/07/27 14:52:45 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1501131033996_0005_01_000010 on host: Slave3. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1501131033996_0005_01_000010 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1

avatar
Master Collaborator

If I understand it correctly - you are able to get past the earlier messages complaining about "Yarn application has already ended!" and now when you try to run pyspark2 it gives you a shell prompt, however, running simple commands to convert a list of strings to all upper case results in containers getting killed with Exit Status:1 . 

 

$ pyspark2
Using Python version 3.6.1 (default, Jul 27 2017 11:07:01)
sparkSession available as 'spark'.
 
>>> strings=['old']
>>> s2=sc.parallelize(strings)
>>> s3=s2.map(lambda x:x.upper())
>>> s3.collect()
[Stage 0:> (0 + 0) / 2]17/07/27 14:52:18 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1501131033996_0005_01_000002 on host: Slave3. Exit status: 1.

 

To review why the application failed, we need to look at the container logs. 

Container logs are available from command line by running yarn logs command

 # yarn logs -applicationId <application ID>  | less

 

OR

Cloudera Manager > Yarn > WebUI > Resource Manager WebUI > application_1501131033996_0005 > Check for Logs at the bottom > stderr and stdout