Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Trying to start the nodemanager

avatar
Contributor

Hi

I am trying to start live the nodemanager previously decommisionned on a node but it stops soon after it starts...

ERROR collector.NodeTimelineCollectorManager (NodeTimelineCollectorManager.java:startWebApp(314)) - The per-node collector webapp failed to start.

java.io.IOException: Problem starting http server

 

any idea what could be the problem?

4 REPLIES 4

avatar
Master Mentor

@Koffi 
Can you please share more detailed NodeManager log along with complete error trace including Caused By section.

The snippet that you posted shows the effect of failure that  "Problem starting http server" but the actual problem will be logged somewhere before this line in the nodemanager log.   Like there can be a Disk space issue on the Nodemanager host where it writes to 'yarn.nodemanager.log-dirs' directory OR it might not be able to bind to the NP port ... etc..   

So sharing more detailed NodeManager log will give us better idea.

avatar
Contributor

Hi @jsensharma

Please find below the complete error trace including Caused By section:

 

2019-08-19 13:37:49,643 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:run(997)) - Public cache exiting
2019-08-19 13:37:49,643 WARN nodemanager.NodeResourceMonitorImpl (NodeResourceMonitorImpl.java:run(167)) - org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl is interrupted. Exiting.
2019-08-19 13:37:49,645 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(210)) - Stopping NodeManager metrics system...
2019-08-19 13:37:49,646 INFO impl.MetricsSinkAdapter (MetricsSinkAdapter.java:publishMetricsFromQueue(141)) - timeline thread interrupted.
2019-08-19 13:37:49,647 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(216)) - NodeManager metrics system stopped.
2019-08-19 13:37:49,647 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(607)) - NodeManager metrics system shutdown complete.
2019-08-19 13:37:49,647 ERROR nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(936)) - Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: The per-node collector webapp failed to start.
at org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.startWebApp(NodeTimelineCollectorManager.java:315)
at org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStart(NodeTimelineCollectorManager.java:132)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStart(PerNodeTimelineCollectorsAuxService.java:101)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStart(AuxServices.java:313)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStart(ContainerManagerImpl.java:643)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:934)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013)
Caused by: java.io.IOException: Problem starting http server
at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1165)
at org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.startWebApp(NodeTimelineCollectorManager.java:311)
... 13 more
Caused by: java.security.UnrecoverableKeyException: Get Key failed: Given final block not properly padded
at sun.security.pkcs12.PKCS12KeyStore.engineGetKey(PKCS12KeyStore.java:410)
at sun.security.provider.KeyStoreDelegator.engineGetKey(KeyStoreDelegator.java:96)
at sun.security.provider.JavaKeyStore$DualFormatJKS.engineGetKey(JavaKeyStore.java:70)
at java.security.KeyStore.getKey(KeyStore.java:1023)
at sun.security.ssl.SunX509KeyManagerImpl.<init>(SunX509KeyManagerImpl.java:133)
at sun.security.ssl.KeyManagerFactoryImpl$SunX509.engineInit(KeyManagerFactoryImpl.java:70)
at javax.net.ssl.KeyManagerFactory.init(KeyManagerFactory.java:256)
at org.eclipse.jetty.util.ssl.SslContextFactory.getKeyManagers(SslContextFactory.java:1087)
at org.eclipse.jetty.util.ssl.SslContextFactory.load(SslContextFactory.java:301)
at org.eclipse.jetty.util.ssl.SslContextFactory.doStart(SslContextFactory.java:221)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:113)
at org.eclipse.jetty.server.SslConnectionFactory.doStart(SslConnectionFactory.java:72)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:113)
at org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:268)
at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:81)
at org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:235)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.server.Server.doStart(Server.java:401)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1134)
... 14 more
Caused by: javax.crypto.BadPaddingException: Given final block not properly padded
at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:989)
at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:845)
at com.sun.crypto.provider.PKCS12PBECipherCore.implDoFinal(PKCS12PBECipherCore.java:399)
at com.sun.crypto.provider.PKCS12PBECipherCore$PBEWithSHA1AndDESede.engineDoFinal(PKCS12PBECipherCore.java:431)
at javax.crypto.Cipher.doFinal(Cipher.java:2165)
at sun.security.pkcs12.PKCS12KeyStore.engineGetKey(PKCS12KeyStore.java:348)
... 37 more
2019-08-19 13:37:49,651 INFO nodemanager.NodeManager (LogAdapter.java:info(51)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at hd-prd-ll01.hadoop.com/**.*.**.**
************************************************************/

avatar
Contributor

@jsensharma 

I have realized that this host has a different java by default then the ambari host

 

host:

shell> java -version 

openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-b10)
OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)

 

ambari host:

shell> java -version

java version "1.8.0_112"
Java(TM) SE Runtime Environment (build 1.8.0_112-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixed mode)

 

Could it be the root cause of this issue?

avatar
Master Mentor

@Koffi 

It may or may not be related to the JDK.
As currently based on the error it looks more related to the issue with the SSL keystore/certificate configured for the NodeManager. As we see the following cause of failure.

Caused by: javax.crypto.BadPaddingException: Given final block not properly padded
	at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:989)
	at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:845)


- Can you please let us know how you actually configured the SSL for NodeManager can you please share the exact details / properties that you changed in YARN config?

- Also can you please check if you are able to list the keystore/certificate properly without any issue?

- It will be good to know how did you generate the certificates Self Signed or CA Signed?
- When was it working earlier? Did you recently configure the keystore/certificate for NodeManager?