- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
is it safe to remove the /tmp/hive/hive folder?
Created ‎06-16-2019 03:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
is it safe to remove the /tmp/hive/hive folder? ( from hdfs )
as
hdfs dfs -rm -r /tmp/hive/hive
the reason for that because under /tmp/hive/hive we have thousand of files and we cant delete therm
hdfs dfs -ls /tmp/hive/ Found 7 items drwx------ - admin hdfs 0 2019-03-05 12:00 /tmp/hive/admin drwx------ - drt hdfs 0 2019-06-16 14:02 /tmp/hive/drt drwx------ - ambari-qa hdfs 0 2019-06-16 15:11 /tmp/hive/ambari-qa drwx------ - anonymous hdfs 0 2019-06-16 08:57 /tmp/hive/anonymous drwx------ - hdfs hdfs 0 2019-06-13 08:42 /tmp/hive/hdfs drwx------ - hive hdfs 0 2019-06-13 10:58 /tmp/hive/hive drwx------ - root hdfs 0 2018-07-17 23:37 /tmp/hive/root You have mail in /var/spool/mail/root
Created ‎06-16-2019 10:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As per the apache hive docs there seems to be some parameters and tools available to deal with such issue. Although i have not personally tested those tools. But looks like they were introduced to deal with similar issue long back as part of https://issues.apache.org/jira/browse/HIVE-13429
For example i see that the Hive Config "hive.exec.scratchdir" points to the "/tmp/hive" dir.
Can you please check and let us know what is the value set for the following parameter "hive.scratchdir.lock". (if not set then default value will be "false"? Additionally you might want to refer about "hive.server2.clear.dangling.scratchdir" and "hive.start.cleanup.scratchdir" parameters of Hive Server config.
Please refer to [1] the following link to know more about those parameters.
There is a tool "cleardanglingscratchdir" mentioned as part of the link [2] may be you would like to read more about it.
# hive --service cleardanglingscratchdir [-r] [-v] [-s scratchdir] -r dry-run mode, which produces a list on console -v verbose mode, which prints extra debugging information -s if you are using non-standard scratch directory
.
Created ‎06-16-2019 03:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
we try the following to remove files that older then 10 days , but because there are so many files then files not deleted at all
hdfs dfs -ls /tmp/hive/hive | tr -s " " | cut -d' ' -f6-8 | grep "^[0-9]" | awk 'BEGIN{ MIN=14400; LAST=60*MIN; "date +%s" | getline NOW } { cmd="date -d'\''"$1" "$2"'\'' +%s"; cmd | getline WHEN; DIFF=NOW-WHEN; if(DIFF > LAST){ print "Deleting: "$3; system("hdfs dfs -rm -r "$3) }}'
Created ‎06-16-2019 03:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
from - https://stackoverflow.com/questions/44235019/delete-files-older-than-10days-on-hdfs
Created ‎06-16-2019 10:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As per the apache hive docs there seems to be some parameters and tools available to deal with such issue. Although i have not personally tested those tools. But looks like they were introduced to deal with similar issue long back as part of https://issues.apache.org/jira/browse/HIVE-13429
For example i see that the Hive Config "hive.exec.scratchdir" points to the "/tmp/hive" dir.
Can you please check and let us know what is the value set for the following parameter "hive.scratchdir.lock". (if not set then default value will be "false"? Additionally you might want to refer about "hive.server2.clear.dangling.scratchdir" and "hive.start.cleanup.scratchdir" parameters of Hive Server config.
Please refer to [1] the following link to know more about those parameters.
There is a tool "cleardanglingscratchdir" mentioned as part of the link [2] may be you would like to read more about it.
# hive --service cleardanglingscratchdir [-r] [-v] [-s scratchdir] -r dry-run mode, which produces a list on console -v verbose mode, which prints extra debugging information -s if you are using non-standard scratch directory
.
Created ‎06-16-2019 10:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@dear jay - what is the meaning of hive.scratchdir.lock when is set to false?
Created ‎06-16-2019 10:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
second is it safe to delete the folder -
hdfs dfs -rm -r /tmp/hive/hive
Created ‎06-16-2019 10:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"hive.scratchdir.lock" : When true, holds a lock file in the scratch directory. If a Hive process dies and accidentally leaves a dangling scratchdir behind, the cleardanglingscratchdir tool will remove it.
When false, does not create a lock file and therefore the cleardanglingscratchdir tool cannot remove any dangling scratch directories.
Regarding your query "second is it safe to delete the folder - /tmp/hive/hive"
>>> I do not think that we should do it on our own. As the whole purpose of the following JIRA was to introduce some tool like "cleardanglingscratchdir" to safely remove the scratch contents. https://issues.apache.org/jira/browse/HIVE-13429
.
Created ‎06-16-2019 10:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
for your info - actually we already delete this folder before you post your answer , and after we restart the hive service in ambari , it create again the /tmp/hive/hive folder
Created ‎06-16-2019 10:54 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To clean up the Hive scratch directory manually may not be a safe option for a multi-user environment (where multiple users might be executing the hive queries concurrently) since it will accidentally remove the scratch directory in use.
Created ‎06-16-2019 10:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@dear Jay - you said - "I do not think that we should do it on our own" I agree but we not have a choice because under /tmp/hive/hive we have a millions of folders and we cant delete them . so after we delete the folder from hdfs , we seen that after hive restart it create again the /tmp/hive/hive folder , do you have some advice what need to check after this brutal action ?
