Created 05-21-2018 11:19 AM
Hello guys,
App timeline server is not starting up. Whenever I try to bring it up,it stops within a minute.
Here are the logs from /var/log/hadoop-yarn/yarn/yarn-yarn-timelineserver-node2.log
2018-05-21 15:57:56,292 INFO timeline.RollingLevelDB (RollingLevelDB.java:initRollingLevelDB(258)) - Initializing rolling leveldb instance :file:/hadoop/yarn/timeline/leveldb-timeline-store/indexes-ldb.2017-08-22-13 for start time: 1503406800000
2018-05-21 15:57:56,409 INFO timeline.RollingLevelDB (RollingLevelDB.java:initRollingLevelDB(266)) - Added rolling leveldb instance 2017-08-22-13 to indexes-ldb
2018-05-21 15:57:56,717 INFO timeline.RollingLevelDBTimelineStore (RollingLevelDBTimelineStore.java:checkVersion(1581)) - Loaded timeline store version info 1.0
2018-05-21 15:57:56,720 INFO timeline.EntityGroupFSTimelineStore (EntityGroupFSTimelineStore.java:serviceInit(157)) - Cleaner set to delete logs older than 604800 seconds
2018-05-21 15:57:56,720 INFO timeline.EntityGroupFSTimelineStore (EntityGroupFSTimelineStore.java:serviceInit(164)) - Unknown apps will be treated as complete after 86400 seconds
2018-05-21 15:57:56,720 INFO timeline.EntityGroupFSTimelineStore (EntityGroupFSTimelineStore.java:serviceInit(170)) - Application cache size is 10
2018-05-21 15:57:56,755 FATAL applicationhistoryservice.ApplicationHistoryServer (ApplicationHistoryServer.java:launchAppHistoryServer(171)) - Error starting ApplicationHistoryServer
java.lang.InternalError: java.io.FileNotFoundException: /usr/jdk64/jdk1.8.0_60/jre/lib/ext/sunec.jar (Too many open files)
at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:1003)
at sun.misc.URLClassPath.getResource(URLClassPath.java:212)
at java.net.URLClassLoader$1.run(URLClassLoader.java:365)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at sun.security.jca.ProviderConfig$2.run(ProviderConfig.java:215)
at sun.security.jca.ProviderConfig$2.run(ProviderConfig.java:206)
at java.security.AccessController.doPrivileged(Native Method)
at sun.security.jca.ProviderConfig.doLoadProvider(ProviderConfig.java:206)
at sun.security.jca.ProviderConfig.getProvider(ProviderConfig.java:187)
at sun.security.jca.ProviderList.getProvider(ProviderList.java:233)
at sun.security.jca.ProviderList.getService(ProviderList.java:331)
at sun.security.jca.GetInstance.getInstance(GetInstance.java:157)
at javax.net.ssl.KeyManagerFactory.getInstance(KeyManagerFactory.java:137)
at org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(FileBasedKeyStoresFactory.java:179)
at org.apache.hadoop.security.ssl.SSLFactory.init(SSLFactory.java:131)
at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.newSslConnConfigurator(TimelineClientImpl.java:656)
at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.newConnConfigurator(TimelineClientImpl.java:631)
at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceInit(TimelineClientImpl.java:330)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:170)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.createAndInitYarnClient(EntityGroupFSTimelineStore.java:454)
at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.serviceInit(EntityGroupFSTimelineStore.java:173)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:168)
at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:178)
Caused by: java.io.FileNotFoundException: /usr/jdk64/jdk1.8.0_60/jre/lib/ext/sunec.jar (Too many open files)
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:219)
at java.util.zip.ZipFile.<init>(ZipFile.java:149)
at java.util.jar.JarFile.<init>(JarFile.java:166)
at java.util.jar.JarFile.<init>(JarFile.java:103)
at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:893)
at sun.misc.URLClassPath$JarLoader.access$700(URLClassPath.java:756)
at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:838)
at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:831)
at java.security.AccessController.doPrivileged(Native Method)
at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:830)
at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:1001)
... 34 more
2018-05-21 15:57:56,759 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status -1
2018-05-21 15:57:56,761 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(211)) - Stopping ApplicationHistoryServer metrics system...
2018-05-21 15:57:56,762 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(217)) - ApplicationHistoryServer metrics system stopped.
2018-05-21 15:57:56,762 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(605)) - ApplicationHistoryServer metrics system shutdown complete.
2018-05-21 15:57:56,762 INFO timeline.EntityGroupFSTimelineStore (EntityGroupFSTimelineStore.java:serviceStop(297)) - Stopping EntityGroupFSTimelineStore
2018-05-21 15:57:56,805 INFO applicationhistoryservice.ApplicationHistoryServer (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down ApplicationHistoryServer at node2/IP
As per the exception, I tried increasing the ulimit for the user. Even after the value was tripled, I'm getting the same error.
Any help will be appreciated.
Created 05-21-2018 11:23 AM
As you are getting the following error "Too many open files"
(ApplicationHistoryServer.java:launchAppHistoryServer(171)) - Error starting ApplicationHistoryServer java.lang.InternalError: java.io.FileNotFoundException: /usr/jdk64/jdk1.8.0_60/jre/lib/ext/sunec.jar (Too many open files)
It indicates that you might not have set the File Descriptor properties to higher value. Please check the file "/etc/security/limits.conf" to see if you have set system-wide ulimits properly or not?
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/kerb-config-limits.html
Example:
# ulimit -a # ulimit -n 32768 # ulimit -n Also please check what is the value set in this file: # cat /etc/security/limits.d/yarn.conf
.
Also please check what is the value for the following property:
yarn.timeline-service.leveldb-timeline-store.max-open-files
.
Created 07-04-2018 05:20 AM
Hello @Jay Kumar SenSharma any updates on this issue?
Created 07-04-2018 05:34 AM
If you are still facing the same Too many Open files issue then please check the number of open file descriptions might not be set properly.
Please share the output which we requested in our previous update.
# ulimit -a # ulimit -n 32768 # ulimit -n Also please check what is the value setinthis file: # cat /etc/security/limits.d/yarn.conf
.
Also please share the output of the following command:
# lsof -p $APP_TIMELINE_PID | wc -l # lsof -p $APP_TIMELINE_PID
.
Created 07-04-2018 07:09 AM
# ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 63522 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 32768 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 63522 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited<br> # ulimit -n 32768 # cat /etc/security/limits.d/yarn.conf yarn - nofile 65536 yarn - nproc 65536 # lsof -p $APP_TIMELINE_PID lsof: no process ID specified #lsof -i TCP:10200 blank
Created 05-21-2018 01:06 PM
I did try setting higher values for open file limit in the mentioned files. Still, the server is crashing.