Created 06-16-2019 03:47 PM
is it safe to remove the /tmp/hive/hive folder? ( from hdfs )
as
hdfs dfs -rm -r /tmp/hive/hive
the reason for that because under /tmp/hive/hive we have thousand of files and we cant delete therm
hdfs dfs -ls /tmp/hive/ Found 7 items drwx------ - admin hdfs 0 2019-03-05 12:00 /tmp/hive/admin drwx------ - drt hdfs 0 2019-06-16 14:02 /tmp/hive/drt drwx------ - ambari-qa hdfs 0 2019-06-16 15:11 /tmp/hive/ambari-qa drwx------ - anonymous hdfs 0 2019-06-16 08:57 /tmp/hive/anonymous drwx------ - hdfs hdfs 0 2019-06-13 08:42 /tmp/hive/hdfs drwx------ - hive hdfs 0 2019-06-13 10:58 /tmp/hive/hive drwx------ - root hdfs 0 2018-07-17 23:37 /tmp/hive/root You have mail in /var/spool/mail/root
Created 06-16-2019 10:31 PM
As per the apache hive docs there seems to be some parameters and tools available to deal with such issue. Although i have not personally tested those tools. But looks like they were introduced to deal with similar issue long back as part of https://issues.apache.org/jira/browse/HIVE-13429
For example i see that the Hive Config "hive.exec.scratchdir" points to the "/tmp/hive" dir.
Can you please check and let us know what is the value set for the following parameter "hive.scratchdir.lock". (if not set then default value will be "false"? Additionally you might want to refer about "hive.server2.clear.dangling.scratchdir" and "hive.start.cleanup.scratchdir" parameters of Hive Server config.
Please refer to [1] the following link to know more about those parameters.
There is a tool "cleardanglingscratchdir" mentioned as part of the link [2] may be you would like to read more about it.
# hive --service cleardanglingscratchdir [-r] [-v] [-s scratchdir] -r dry-run mode, which produces a list on console -v verbose mode, which prints extra debugging information -s if you are using non-standard scratch directory
.
Created 06-16-2019 03:49 PM
we try the following to remove files that older then 10 days , but because there are so many files then files not deleted at all
hdfs dfs -ls /tmp/hive/hive | tr -s " " | cut -d' ' -f6-8 | grep "^[0-9]" | awk 'BEGIN{ MIN=14400; LAST=60*MIN; "date +%s" | getline NOW } { cmd="date -d'\''"$1" "$2"'\'' +%s"; cmd | getline WHEN; DIFF=NOW-WHEN; if(DIFF > LAST){ print "Deleting: "$3; system("hdfs dfs -rm -r "$3) }}'
Created 06-16-2019 03:51 PM
from - https://stackoverflow.com/questions/44235019/delete-files-older-than-10days-on-hdfs
Created 06-16-2019 10:31 PM
As per the apache hive docs there seems to be some parameters and tools available to deal with such issue. Although i have not personally tested those tools. But looks like they were introduced to deal with similar issue long back as part of https://issues.apache.org/jira/browse/HIVE-13429
For example i see that the Hive Config "hive.exec.scratchdir" points to the "/tmp/hive" dir.
Can you please check and let us know what is the value set for the following parameter "hive.scratchdir.lock". (if not set then default value will be "false"? Additionally you might want to refer about "hive.server2.clear.dangling.scratchdir" and "hive.start.cleanup.scratchdir" parameters of Hive Server config.
Please refer to [1] the following link to know more about those parameters.
There is a tool "cleardanglingscratchdir" mentioned as part of the link [2] may be you would like to read more about it.
# hive --service cleardanglingscratchdir [-r] [-v] [-s scratchdir] -r dry-run mode, which produces a list on console -v verbose mode, which prints extra debugging information -s if you are using non-standard scratch directory
.
Created 06-16-2019 10:46 PM
@dear jay - what is the meaning of hive.scratchdir.lock when is set to false?
Created 06-16-2019 10:47 PM
second is it safe to delete the folder -
hdfs dfs -rm -r /tmp/hive/hive
Created 06-16-2019 10:51 PM
"hive.scratchdir.lock" : When true, holds a lock file in the scratch directory. If a Hive process dies and accidentally leaves a dangling scratchdir behind, the cleardanglingscratchdir tool will remove it.
When false, does not create a lock file and therefore the cleardanglingscratchdir tool cannot remove any dangling scratch directories.
Regarding your query "second is it safe to delete the folder - /tmp/hive/hive"
>>> I do not think that we should do it on our own. As the whole purpose of the following JIRA was to introduce some tool like "cleardanglingscratchdir" to safely remove the scratch contents. https://issues.apache.org/jira/browse/HIVE-13429
.
Created 06-16-2019 10:48 PM
for your info - actually we already delete this folder before you post your answer , and after we restart the hive service in ambari , it create again the /tmp/hive/hive folder
Created 06-16-2019 10:54 PM
To clean up the Hive scratch directory manually may not be a safe option for a multi-user environment (where multiple users might be executing the hive queries concurrently) since it will accidentally remove the scratch directory in use.
Created 06-16-2019 10:59 PM
@dear Jay - you said - "I do not think that we should do it on our own" I agree but we not have a choice because under /tmp/hive/hive we have a millions of folders and we cant delete them . so after we delete the folder from hdfs , we seen that after hive restart it create again the /tmp/hive/hive folder , do you have some advice what need to check after this brutal action ?