Community Articles

Find and share helpful community-sourced technical articles.
Labels (3)
avatar
Guru

I have seen an issue with Application Timeline Server (ATS). Actually Application Timeline Server (ATS) uses a LevelDB database which is stored in the location specified by yarn.timeline-service.leveldb-timeline-store.path in yarn-site.xml.All metadata store in *.sst files under specified location.

Due to this we may face an space issue.But It is not good practice to delete *.sst files directly. An *.sst file is a sorted table of key/value entries sorted by key and key/value entries are partitioned into different *.sst files by key instead of timestamp, such that there’s actually no old *.sst file to delete.

But to solve the space of the leveldb storage, you can enable TTL (time to live). Once it is enabled, the timeline entities out of ttl will be discarded and you can set ttl to a smaller number than the default to give a timeline entity shorter lifetime.

<property> <description>Enable age off of timeline store data.</description> <name>yarn.timeline-service.ttl-enable</name> <value>true</value> </property>

<property> <description>Time to live for timeline store data in milliseconds.</description> <name>yarn.timeline-service.ttl-ms</name> <value>604800000</value> </property>

But if by mistake you deleted these files manually as I did then you may see ATS issue or you may get following error.

error code: 500, message: Internal Server Error{“message”:”Failed to fetch results by the proxy from url: http://server:8188/ws/v1/timeline/TEZ_DAG_ID?limit=11&_=1469716920323&primaryFilter=user:$user&”,”status”:500,”trace”:”{\”exception\”:\”WebApplicationException\”,\”message\”:\”java.io.IOException: org.iq80.leveldb.DBException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /hadoop/yarn/timeline/leveldb-timeline-store.ldb/6378017.sst: No such file or directory\”,\”javaClassName\”:\”javax.ws.rs.WebApplicationException\”}”}

Or

(AbstractService.java:noteFailure(272)) – Service org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore failed in state INITED; cause: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 116 missing files; e.g.: /tmp/hadoop/yarn/timeline/leveldb-timeline-store.ldb/001052.sst org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 116 missing files; e.g.: /tmp/hadoop/yarn/timeline/leveldb-timeline-store.ldb/001052.sst

Resolution:

  • Goto configured location /hadoop/yarn/timeline/leveldb-timeline-store.ldb and then you will see a text file named “CURRENT”
    • cd /hadoop/yarn/timeline/leveldb-timeline-store.ldb
    • ls -ltrh | grep -i CURRENT
  • Copy your CURRENT file to some temporary location
    • cp /hadoop/yarn/timeline/leveldb-timeline-store.ldb/CURRENT /tmp
  • Now you need to remove this file
    • rm /hadoop/yarn/timeline/leveldb-timeline-store.ldb/CURRENT
  • Restart the YARN service via Ambari

With the help of above steps I have resolved this issue. I hope it will help you as well.

6,950 Views
Comments

@Saurabh Kumar - Nice Article!

P.S - I have removed username and replaced it with $user in the logs.