I want to be able to parse Tez logs and pull out information like query text, start_time, user, etc. from that.
1. Where are they stored on the HDFS?
- Is there a structure to these logs?
2. Using the Timelinserver REST API's?
@rohan_kulkarni - I am not sure what you are looking for exactly? Are you looking for Tez View logs or Yarn app logs?
- Also you mentioned "pull out information like query text, start_time, user, etc.".
You can pull out these information from the HiveServer2 logs also. You need to check on which HS2 query is running and then check the logs accordingly.
My main objective is to extract query text, start_time, user,time_taken etc. of executed queries on the Tez engine.
HiveServer2 logs are in log4j format which is difficult to extract information from.
Are there any logs that get generated and stored on HDFS? Is there any structure to these log files?
@rohan_kulkarni - If you are using HDP 2.6.5 or older version, then you can check the same from the Tez View.
Tez UI has two tabs, "Hive Queries" and "All DAGS". Hive queries shows the query start and end time, And ALL DAGS show all the information about the DAGS. Can you please check and confirm, if this is correct for you.
Yarn UI shows the Application start and end time.
Or else you need to grep with the keywords to get the query details from the HiveServer2 logs.
I can see the information in the Tez UI about past executed queries. Where is the Tez UI fetching this information from? HDFS? If yes? Then the question is where in hdfs? and how can I parse it programmatically say through webhdfs api's?
Okay, I see that the ATS puts the logs in /ats/done directory in HDFS and this is configured by property yarn.timeline-service.entity-group-fs-store.done-dir
1. What is the difference between
org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore ? Does the ATS generate logs in path /ats/done only if
yarn.timeline-service.entity-group-fs-store.summary-store is set to the EntityGroupFSTimelineStore ?
2. Do all three versions of the Application Timeline Server (1.0, 1.5, 2.0) generate logs in /ats/done or there is a difference in versions?
3. For some executed queries, I see that only single entity, summary logs get generated whereas for other executed queries I see multiple entity, summary logs getting generated. Is there any structure to this? Like when do we expect a single entity or summary logs or multiple?