Created 06-16-2019 03:47 PM
is it safe to remove the /tmp/hive/hive folder? ( from hdfs )
as
hdfs dfs -rm -r /tmp/hive/hive
the reason for that because under /tmp/hive/hive we have thousand of files and we cant delete therm
hdfs dfs -ls /tmp/hive/ Found 7 items drwx------ - admin hdfs 0 2019-03-05 12:00 /tmp/hive/admin drwx------ - drt hdfs 0 2019-06-16 14:02 /tmp/hive/drt drwx------ - ambari-qa hdfs 0 2019-06-16 15:11 /tmp/hive/ambari-qa drwx------ - anonymous hdfs 0 2019-06-16 08:57 /tmp/hive/anonymous drwx------ - hdfs hdfs 0 2019-06-13 08:42 /tmp/hive/hdfs drwx------ - hive hdfs 0 2019-06-13 10:58 /tmp/hive/hive drwx------ - root hdfs 0 2018-07-17 23:37 /tmp/hive/root You have mail in /var/spool/mail/root
Created 06-16-2019 10:31 PM
As per the apache hive docs there seems to be some parameters and tools available to deal with such issue. Although i have not personally tested those tools. But looks like they were introduced to deal with similar issue long back as part of https://issues.apache.org/jira/browse/HIVE-13429
For example i see that the Hive Config "hive.exec.scratchdir" points to the "/tmp/hive" dir.
Can you please check and let us know what is the value set for the following parameter "hive.scratchdir.lock". (if not set then default value will be "false"? Additionally you might want to refer about "hive.server2.clear.dangling.scratchdir" and "hive.start.cleanup.scratchdir" parameters of Hive Server config.
Please refer to [1] the following link to know more about those parameters.
There is a tool "cleardanglingscratchdir" mentioned as part of the link [2] may be you would like to read more about it.
# hive --service cleardanglingscratchdir [-r] [-v] [-s scratchdir] -r dry-run mode, which produces a list on console -v verbose mode, which prints extra debugging information -s if you are using non-standard scratch directory
.
Created 06-16-2019 11:02 PM
I do not remember/think of any specific idem to check at this point, But as long as you are able to run your Hive Queries without any issue and HiveService checks are also running fine. I think we should be good.
Created 06-16-2019 11:21 PM
@Dear KJay
so finally lets summary
when we set the following
hive.server2.clear.dangling.scratchdir=true hive.start.cleanup.scratchdir=true
and then we restart the hive service from ambari
do you think this configuration will be able to delete the old folders under /tmp/hive/hive in spite the folder are a millions folders ?
Created 06-26-2019 01:14 PM
Yes you can delete /tmp/hive/hive if it is occupying the hdfs. Its better to schedule a script for every 15 days to cleanup the directory and enable e-mail notifications to get the alerts/warns accordingly. I tried the same in my org. due to storage crises.
Thank you.