Support Questions

Find answers, ask questions, and share your expertise

is it safe to remove the /tmp/hive/hive folder?

avatar


is it safe to remove the /tmp/hive/hive folder? ( from hdfs )


as


hdfs dfs -rm -r /tmp/hive/hive


the reason for that because under /tmp/hive/hive we have thousand of files and we cant delete therm


hdfs dfs -ls /tmp/hive/
Found 7 items
drwx------   - admin     hdfs          0 2019-03-05 12:00 /tmp/hive/admin
drwx------   - drt       hdfs          0 2019-06-16 14:02 /tmp/hive/drt
drwx------   - ambari-qa hdfs          0 2019-06-16 15:11 /tmp/hive/ambari-qa
drwx------   - anonymous hdfs          0 2019-06-16 08:57 /tmp/hive/anonymous
drwx------   - hdfs      hdfs          0 2019-06-13 08:42 /tmp/hive/hdfs
drwx------   - hive      hdfs          0 2019-06-13 10:58 /tmp/hive/hive
drwx------   - root      hdfs          0 2018-07-17 23:37 /tmp/hive/root
You have mail in /var/spool/mail/root
Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Michael Bronson

As per the apache hive docs there seems to be some parameters and tools available to deal with such issue. Although i have not personally tested those tools. But looks like they were introduced to deal with similar issue long back as part of https://issues.apache.org/jira/browse/HIVE-13429


For example i see that the Hive Config "hive.exec.scratchdir" points to the "/tmp/hive" dir.

Can you please check and let us know what is the value set for the following parameter "hive.scratchdir.lock". (if not set then default value will be "false"? Additionally you might want to refer about "hive.server2.clear.dangling.scratchdir" and "hive.start.cleanup.scratchdir" parameters of Hive Server config.


Please refer to [1] the following link to know more about those parameters.

There is a tool "cleardanglingscratchdir" mentioned as part of the link [2] may be you would like to read more about it.

# hive --service cleardanglingscratchdir [-r] [-v] [-s scratchdir]
    -r      dry-run mode, which produces a list on console
    -v      verbose mode, which prints extra debugging information
    -s      if you are using non-standard scratch directory

.

[1] https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hi....

[2] https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-ClearDa...



View solution in original post

12 REPLIES 12

avatar

we try the following to remove files that older then 10 days , but because there are so many files then files not deleted at all



hdfs dfs -ls /tmp/hive/hive   |   tr -s " "    |    cut -d' ' -f6-8    |     grep "^[0-9]"    |    awk 'BEGIN{ MIN=14400; LAST=60*MIN; "date +%s" | getline NOW } { cmd="date -d'\''"$1" "$2"'\'' +%s"; cmd | getline WHEN; DIFF=NOW-WHEN; if(DIFF > LAST){ print "Deleting: "$3; system("hdfs dfs -rm -r "$3) }}'




Michael-Bronson

avatar

avatar
Master Mentor

@Michael Bronson

As per the apache hive docs there seems to be some parameters and tools available to deal with such issue. Although i have not personally tested those tools. But looks like they were introduced to deal with similar issue long back as part of https://issues.apache.org/jira/browse/HIVE-13429


For example i see that the Hive Config "hive.exec.scratchdir" points to the "/tmp/hive" dir.

Can you please check and let us know what is the value set for the following parameter "hive.scratchdir.lock". (if not set then default value will be "false"? Additionally you might want to refer about "hive.server2.clear.dangling.scratchdir" and "hive.start.cleanup.scratchdir" parameters of Hive Server config.


Please refer to [1] the following link to know more about those parameters.

There is a tool "cleardanglingscratchdir" mentioned as part of the link [2] may be you would like to read more about it.

# hive --service cleardanglingscratchdir [-r] [-v] [-s scratchdir]
    -r      dry-run mode, which produces a list on console
    -v      verbose mode, which prints extra debugging information
    -s      if you are using non-standard scratch directory

.

[1] https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hi....

[2] https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-ClearDa...



avatar

@dear jay - what is the meaning of hive.scratchdir.lock when is set to false?

Michael-Bronson

avatar

second is it safe to delete the folder -

hdfs dfs -rm -r /tmp/hive/hive

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

"hive.scratchdir.lock" : When true, holds a lock file in the scratch directory. If a Hive process dies and accidentally leaves a dangling scratchdir behind, the cleardanglingscratchdir tool will remove it.

When false, does not create a lock file and therefore the cleardanglingscratchdir tool cannot remove any dangling scratch directories.



Regarding your query "second is it safe to delete the folder - /tmp/hive/hive"

>>> I do not think that we should do it on our own. As the whole purpose of the following JIRA was to introduce some tool like "cleardanglingscratchdir" to safely remove the scratch contents. https://issues.apache.org/jira/browse/HIVE-13429

.

avatar

for your info - actually we already delete this folder before you post your answer , and after we restart the hive service in ambari , it create again the /tmp/hive/hive folder

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

To clean up the Hive scratch directory manually may not be a safe option for a multi-user environment (where multiple users might be executing the hive queries concurrently) since it will accidentally remove the scratch directory in use.

avatar

@dear Jay - you said - "I do not think that we should do it on our own" I agree but we not have a choice because under /tmp/hive/hive we have a millions of folders and we cant delete them . so after we delete the folder from hdfs , we seen that after hive restart it create again the /tmp/hive/hive folder , do you have some advice what need to check after this brutal action ?

Michael-Bronson