Reply
Highlighted
Explorer
Posts: 25
Registered: ‎07-06-2018

Hive tmp delegation token files

On one of the edge nodes, we noticed we have few thousand of hidden files names ".hive_hadoop_delegation_tokenxxxxxxxxxxxxx.tmp.crc". under "/tmp/hive" - local directory

 

1. Does these delegation_token files correspond to live jobs using to keep their authentication alive?

2. Are these created by YARN at the time of initial delegation token assignment to a user?

3. Is it safe to delete them or are they cleaned up once a job is completed?

Posts: 1,748
Kudos: 364
Solutions: 277
Registered: ‎07-31-2013

Re: Hive tmp delegation token files

> .hive_hadoop_delegation_tokenxxxxxxxxxxxxx.tmp.crc

Specifically noting the the dot-prefix and the .crc extension here, (inline answers)

> 1. Does these delegation_token files correspond to live jobs using to keep their authentication alive?

These files are indeed written when grabbing delegation tokens and writing them out to a local file, but they are not the delegation token files and are rather the original delegation token files' CRC content that is stored separately by Hadoop's implementation of a LocalFileSystem writer.

> 2. Are these created by YARN at the time of initial delegation token assignment to a user?

Typically these files are only created by Hive when it needs to run a MapredLocalTask from HS2 when user impersonation (doAs) is turned on.

Hive ends up fetching a Delegation Token from HDFS and writes it to the local filesystem. The local filesystem layer Hadoop uses also ends up writing a "side file" (your crc file observed here) along with the original file that Hive wrote.

> 3. Is it safe to delete them or are they cleaned up once a job is completed?

Hive actually does delete the actual delegation token file when the MapredLocalTask is completed, but is unaware of the side-file its use of local filesystems in Hadoop has created. This file therefore gets leftover.

It should be entirely OK to delete this file, as they will not be in use by anything else.
Explorer
Posts: 25
Registered: ‎07-06-2018

Re: Hive tmp delegation token files

[ Edited ]

@Harsh J Thanks for your response.

 

 

2. Impersonation is turned off, do you know why we still have these files created? Maybe there are other scenarios where these get created?

 

3. If they are not cleaned on their own, I should have observed millions of them amouting to all files created so far, but  i can list only a few,  ~4k, can you please comment?

 

Regards 

Posts: 1,748
Kudos: 364
Solutions: 277
Registered: ‎07-31-2013

Re: Hive tmp delegation token files

> 2. Impersonation is turned off, do you know why we still have these files created? Maybe there are other scenarios where these get created?

It appears that the comments I went by were not entirely right - Hive appears to use the token based approach even without doAs being turned on, per the implementation at https://github.com/cloudera/hive/blob/cdh5.15.0-release/ql/src/java/org/apache/hadoop/hive/ql/exec/m... (note that the conditions don't actually check for the impersonation config)

> 3. If they are not cleaned on their own, I should have observed millions of them amouting to all files created so far, but i can list only a few, ~4k, can you please comment?

One explanation could be that since this is limited to Hive's use of MapredLocalTask, it only applies to very specific queries that end up requiring a Map Join, which may be a minority in your cluster.

Furthermore, I believe your OS' default tmpwatch run generally clears out old /tmp-lying data based on atime values, which should be stale for such files post-creation.
Explorer
Posts: 25
Registered: ‎07-06-2018

Re: Hive tmp delegation token files

@Harsh J Thanks for information.

Announcements