Reply
Highlighted
New Contributor
Posts: 2
Registered: ‎10-06-2017

Spark 2.2 on yarn app master container fails on the one node while good on all the rest ones

Hi I am struggling with a curious error

 

I have a yarn cluster of 3 working nodes and when my spark2.2 on yarn job attempts to launch application master container  on the particular one (second) node it fails while on the other 2 nodes  application master  starts  fine and jobs finish successfully 

 

Here is application log

 

 

Log Type: stderr

Log Upload Time: Sat Apr 28 17:29:37 +0300 2018

Log Length: 1197

18/04/28 17:29:35 INFO util.SignalUtils: Registered signal handler for TERM
18/04/28 17:29:35 INFO util.SignalUtils: Registered signal handler for HUP
18/04/28 17:29:35 INFO util.SignalUtils: Registered signal handler for INT
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.SparkConf.get(Lorg/apache/spark/internal/config/ConfigEntry;)Ljava/lang/Object;
	at org.apache.spark.deploy.yarn.ApplicationMaster.<init>(ApplicationMaster.scala:71)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:773)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:772)
	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)

Log Type: stdout

Log Upload Time: Sat Apr 28 17:29:37 +0300 2018

Log Length: 0

 

 

 

 

and here is resourse manager log 

please assist

 

USER=hive2	OPERATION=Application Finished - Failed	TARGET=RMAppManager	RESULT=FAILURE	DESCRIPTION=App failed with state: FAILED	PERMISSIONS=Application application_1524925490307_0003 failed 1 times due to AM Container for appattempt_1524925490307_0003_000001 exited with  exitCode: 1
For more detailed output, check application tracking page:http://bigdata-01.vm-p.rdtex.ru:8088/proxy/application_1524925490307_0003/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1524925490307_0003_01_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
	at org.apache.hadoop.util.Shell.run(Shell.java:504)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.	APPID=application_1524925490307_0003

 

This is hive 2.3.3 on spark 2.2 engine jobs 

all the 3 working nodes are complete twins (cloudera express bundle 5.10) 

 

Any ideas please?

 

 

Announcements