Created 05-30-2018 02:02 PM
Hello. I've got a problem after updating system to HDP 2.6.5. I have a cluster with three nodes and trying to start a simple application with python:
from pyspark import SparkConf, SparkContext from pyspark.sql import SQLContext, SparkSession, HiveContext sc = SparkContext() print sc.master
with command
/usr/bin/spark-submit \ --master yarn \ --deploy-mode cluster \ --name 'test script' \ /opt/test/youdmp/test/script.py
It says
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.yarn.proto.YarnProtos$ResourceProtoOrBuilder.getMemory()I at org.apache.hadoop.yarn.api.records.impl.pb.ResourcePBImpl.getMemory(ResourcePBImpl.java:61) at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:313) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:166) at org.apache.spark.deploy.yarn.Client.run(Client.scala:1217) at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1585) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:906) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I can see such situation when application starts on either on one or another node. But the third node works.
And client deploy-mode says:
Traceback (most recent call last): File "/opt/test/youdmp/test/script.py", line 3, in <module> sc = SparkContext() File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/context.py", line 119, in __init__ File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/context.py", line 181, in _do_init File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/context.py", line 279, in _initialize_context File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1428, in __call__ File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 320, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.lang.NoSuchMethodError: org.apache.hadoop.yarn.proto.YarnProtos$ResourceProtoOrBuilder.getMemory()I at org.apache.hadoop.yarn.api.records.impl.pb.ResourcePBImpl.getMemory(ResourcePBImpl.java:61) at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:313) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:166) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164) at org.apache.spark.SparkContext.<init>(SparkContext.scala:500) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745)
What kind of mistake can it be? What should I fix?
Created 05-31-2018 03:05 PM
The "int getMemory()" API was marked deprecated in favor of the "long getMemorySize()" API. However seems like at the proto level the int API was removed and that's probably why it is throwing NoSuchMethodError. Update the code to use getMemorySize and try.
Created 05-31-2018 04:26 PM
Actually I can't understand who use this method. I tryed my symple test script and sparkPi example on Scala. Both cases works well with master local, and return error wth master yarn.
How can I understand which code should I update?
Created 05-31-2018 03:06 PM
Also, make sure you have updated all your nodes to HDP 2.6.5.
Created 05-31-2018 04:28 PM
I update the whole cluster using HDP. So I think all my nodes should be updated.
Created 10-09-2018 11:10 AM
I got same errors and at last I found there are some non-compliable jar files in spark lib directory. Please check your spark lib path to see if there are some additional jars you added.