I have a fresh install of Ambari (HDP 2.5 - version: 188.8.131.52-37) and I have a few services that will not remain running. There are not yet any users of this cluster; thus, it is simply a new cluster that is not yet in use.
One service that I am struggling to understand what may be going wrong is the Spark Thrift Server.
If I attempt to start the Spark Thrift Server, it generally stops within 1-5 minutes. Even upon a full reboot of the entire cluster this service will stop soon thereafter.
The only output I can find is:
[ambari-heartbeat-processor-0] HeartbeatProcessor:612 - State of service component SPARK_THRIFTSERVER of service SPARK of cluster hdpcluster has changed from STARTED to INSTALLED at host hdpcnode4 according to STATUS_COMMAND
Being this is a cluster, the Spark Thrift Server is running on node4 and ambari is on node1. The above log entry I find on node1 within /var/log/ambari-server/ambari-server.log. My guess is there may be more info about this failure in another log file but I have not had luck finding where this additional log data may be found.
Hi Mark, dealing with Spark Thrift issue too on v2.5.3. I noticed my times are not in sync on all the nodes.
I'm behind a proxy so my ntpd wasn't updating. Probably not your issue but might be good to keep in mind.
17/02/23 12:24:07 INFO Client: client token: N/A diagnostics: Application application_1487754339887_0003 failed 2 times due to Error launching appattempt_1487754339887_0003_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. This token is expired. current time is 1487845447016 found 1487756740753 Note: System times on machines may be out of sync. Check system time and time zones. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:122) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:250) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) . Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1487756139363 final status: FAILED tracking URL: http://HNode2.quasar.com:8088/cluster/app/application_1487754339887_0003 user: hive
Hi Nico, I very much appreciate your response - I did check my times and my NTPd is working fine. All of the nodes have the exact same time.
I will keep digging around and asking questions - hopefully I/we can figure out something.
Thank you for the response, Sandeep. This is part of my problem, I am uncertain in which other logs to look. Can you point me to where I can find the application log for Spark thrift server application in Yarn? Also, on which node that log will be? I.e.: the same node as ambari, or on the node with the error?