Support Questions

Find answers, ask questions, and share your expertise

Nodemanager fails to start. ClassNotFoundException

New Contributor

When I start the Nodemanager through ambari, two of the four nodes failed to start. Log mesage: java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService.

After searching for the class I found that it is defined in the spark-2.3.0.2.6.5.0-292-yarn-shuffle.jar file. That is correctly installed in /usr/hdp/current/spark2-client/aux in all of the nodes.

If I start the process manually, adding in the -classpath of the java command, the string: /usr/hdp/current/spark2-client/aux/*, the Nodemanager start, without any error.

Question: How can I add the string in the ambari interface? .I added it in the "Advanced yarn-site" -> yarn.application.classpath and it seems that this does not work.

Do you have any clue how can successfully correct this problem?

Thanks you very much

Eduardo

4 REPLIES 4

Cloudera Employee

Hi @Eduardo Tarasiuk,

As per configuration, should be under

advanced yarn-site -> yarn.nodemanager.aux-services.spark2_shuffle.classpath ->

{{stack_root}}/{{spark2_version}}/spark2/aux/*

This is from my environment:

[root@aquilodran-4 aux]# find / -name spark-2.3.1.3.0.1.0-187-yarn-shuffle.jar

/usr/hdp/3.0.1.0-187/spark2/aux/spark-2.3.1.3.0.1.0-187-yarn-shuffle.jar

let me know if this helped.

regards.

AQ

New Contributor

Your mentioned configuration is the same than mine, and the jar file is where is it supposed to be (as you wrote above). Please note that I have 4 nodes and the other two are behaving ok, so almost for sure IMHO is not a central configuration problem in Ambari,it is something that is specific to the two specific nodes with problems.

I resolved the issue with the following unclean workaround: in the two failed servers I copied the :

/usr/hdp/2.3.0.2.6.5.0.292/spark2/aux/spark-2.3.0.2.6.5.-292-yarn-shuffle.jar to

/usr/hdp/2.3.0.2.6.5.0.292/hadoop-yarn/lib

After copying the jar to the yarn/lib folder, all the namenodes are starting ok, my problem is that I cannot fully understand the source of this error and behavioue before the copy and probably there are additional bad issues in my system (still not in production).

Question: Why the two failed nodes write in log : util.ApplicationClassLoader .... - classpath [] but in the good nodes the class is correctly written (in the log: /var/log/hadoop-yarn/yarn) as /usr/hdp/2.6.5.0-292/spark/aux/spark-1.6.3.2.6.5.0-292.jar?

Why the : {{stack_root}}/{{spark2_version}}/spark2/aux/* definition is not read as needed in the failed nodes.

I also tried to define in ambari without the symbols, i.e: /usr/hdp/2.6.5.0-292/spark/aux/spark-1.6.3.2.6.5.0-292.jar, also, without any success.

I am afraid that something is wrong in my installation. Have you any clue what to search next?. how this parameter is read by the namenodes? how the namenodes are started by ambari?.

Additional :the file yarn-site.xml in /etc/hadoop/conf is exactly the same in all the nodes (checked by diff)

Probably, the best solution is to completely re-install the nameserver or yarn

(together with the rpms) on the failed servers. How can I do this? (the delete/add service does not really reinstall the service...)

Thank you very much for your previous quick answer!!!

Eduardo

New Contributor

Hi Aquilodran

Thank you very much for your answer.

The advanced yarn-site -> yarn.nodemanager.aux-services.spark2_shuffle.classpath -> {{stack_root}}/{{spark2_version}}/spark2/aux/* is correctly defined in Ambari, also, the jar file correctly installed inside the

{{stack_root}}/{{spark2_version}}/spark2/aux folder.

I found a dirty workaround by copying the spark*-yarn-shuffle.jar file to : {{stack_root}}/hadoop-yarn/lib. This seems to correct the problem.

Inside the log in /var/log/hadoop-yarn/yarn I can find that the classpath for the two failed nodes is empty [] : util.ApplicationClassLoader.java:<init>(105) , but in the two wealthy nodes it is correct. It seems ,at least, that this specific parameter is not read by the nodename processes in the failed nodes

I am suspecting that probably something was wrong in the installation, and maybe there are additional hidden problems that I did not encountered, yet.

Is any way to completely re-install the yarn package or at least the nodename servers again ?.

If not, what is your recommendation to go further with this ?

Thanks very much

Eduardo