Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Where are Tez logs stored?

avatar

I want to be able to parse Tez logs and pull out information like query text, start_time, user, etc. from that.

1. Where are they stored on the HDFS? 

 - Is there a structure to these logs? 

2. Using the Timelinserver REST API's?

 

6 REPLIES 6

avatar
Expert Contributor

@rohan_kulkarni  - I am not sure what you are looking for exactly? Are you looking for Tez View logs or Yarn app logs?
- Also you mentioned "pull out information like query text, start_time, user, etc.".
You can pull out these information from the HiveServer2 logs also. You need to check on which HS2 query is running and then check the logs accordingly.

avatar

My main objective is to extract query text, start_time, user,time_taken etc. of executed queries on the Tez engine. 
HiveServer2 logs are in log4j format which is difficult to extract information from.

Are there any logs that get generated and stored on HDFS? Is there any structure to these log files? 

avatar
Expert Contributor

@rohan_kulkarni - If you are using HDP 2.6.5 or older version, then you can check the same from the Tez View.
Tez UI has two tabs, "Hive Queries" and "All DAGS". Hive queries shows the query start and end time, And ALL DAGS show all the information about the DAGS. Can you please check and confirm, if this is correct for you.

Yarn UI shows the Application start and end time.

 

Or else you need to grep with the keywords to get the query details from the HiveServer2 logs.

avatar

I can see the information in the Tez UI about past executed queries. Where is the Tez UI fetching this information from? HDFS? If yes? Then the question is where in hdfs? and how can I parse it programmatically say through webhdfs api's?  

avatar
Expert Contributor

@rohan_kulkarni - The Tez UI relies on the Application Timeline Server whose role is as a backing store for the application data generated during the lifetime of a YARN application. You can refer below article for more information on this:

 

https://tez.apache.org/tez-ui.html

avatar

Okay, I see that the ATS puts the logs in /ats/done directory in HDFS and this is configured by property yarn.timeline-service.entity-group-fs-store.done-dir

1. What is the difference between 

org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore and 

org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore ? Does the ATS generate logs in path /ats/done only if 

yarn.timeline-service.entity-group-fs-store.summary-store is set to the EntityGroupFSTimelineStore ? 


2. Do all three versions of the Application Timeline Server (1.0, 1.5, 2.0) generate logs in /ats/done or there is a difference in versions?

3. For some executed queries, I see that only single entity, summary logs get generated whereas for other executed queries I see multiple entity, summary logs getting generated. Is there any structure to this? Like when do we expect a single entity or summary logs or multiple?

TIA