Created 06-16-2019 08:56 PM
hi all
we have ambari cluster ( HDP version - 2.5.4 )
in the spark thrift log we can see the error about - /tmp/hive/hive is exceeded: limit=1048576 items=1048576
we try to delete the old files under /tmp/hive/hive , but there are a million of files and we cant delete them because
hdfs dfs -ls /tmp/hive/hive
isn't return any output
any suggestion ? how to delete the old files in spite there are a million of files?
or any other solution/?
* for now spark thrift server isn't started successfully because this error , also hiveserver2 not started also
Caused by: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): The directory item limit of /tmp/hive/hive is exceeded: limit=1048576 items=1048576 at org.apache.hadoop.ipc.Server$Han dler.run(Server.java:2347)
second
can we purge the files? by cron or other?
hdfs dfs -ls /tmp/hive/hive Found 4 items drwx------ - hive hdfs 0 2019-06-16 21:58 /tmp/hive/hive/2f95f6a5-76ad-487e-968c-1873264a3a9c drwx------ - hive hdfs 0 2019-06-16 21:45 /tmp/hive/hive/368d201c-cedf-48dc-bbad-f13d6aed7016 drwx------ - hive hdfs 0 2019-06-16 21:58 /tmp/hive/hive/717fb013-535b-4279-a12e-4fc4261c4d68
Created 06-16-2019 11:11 PM
so this settings ( hive.server2.clear.dangling.scratchdir=true ) supported by the hive version - 1.2.1.2.6 ?
Created 06-16-2019 11:19 PM
As per this JIRA: https://jira.apache.org/jira/browse/HIVE-15068
This adds "hive.server2.clear.dangling.scratchdir" and "hive.server2.clear.dangling.scratchdir.interval" to HiveConf.java are added from hive 1.3.0 and 2.2.0.
So for safe cleaning of the scratch dir you might want to refer to : https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-Scratch...
# hive --service cleardanglingscratchdir [-r] [-v] [-s scratchdir]
Created 06-16-2019 11:16 PM
@ear Jay
so finally lets summary
when we set the following
hive.server2.clear.dangling.scratchdir=true hive.start.cleanup.scratchdir=true
and then we restart the hive service from ambari
do you think this configuration will be able to delete the old folders under /tmp/hive/hive in spite the folder are a millions folders ?
Created 06-16-2019 11:31 PM
As mentioned earlier that the parameters "hive.server2.clear.dangling.scratchdir" and "hive.server2.clear.dangling.scratchdir.interval" to HiveConf.java are added from hive 1.3.0 and 2.2.0.
But as you are using lower version of Hive-1.2.1.2.6 (HDP 2.5) https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.6/bk_release-notes/content/comp_versions.html
Hence those parameters may not take effect because they will be present from "hive 1.3.0 and 2.2.0" version (See: https://jira.apache.org/jira/browse/HIVE-15068) and above. You will have to rely on tools like "cleardanglingscratchdir"
.
Created 06-16-2019 11:30 PM
@dear Jay , ok our hive version is lower so we need to run the - ( from hive user )
do you think this cli will able to remove all old folders under /tmp/hive/hive ?
hive --service cleardanglingscratchdir
Created 06-16-2019 11:38 PM
@Michael Bronson
Without testing .. i can not say for sure if something will work or not.
But a this point i believe in the documentation. If something is written in the Doc like following then ideally it should work :https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-Scratch...
Until unless there is a BUG with the tool reported somewhere for that tool.
I do not find any bug reported for that tool ... So i Believe in that tool until i find a BUG ... or If you find a BUG with that tool then please report it.
Created 06-16-2019 11:36 PM
@Jay when I run it on test lab we get
why ?
[hive@master01 hive]$ hive --service cleardanglingscratchdir Cannot find any scratch directory to clear
Created 06-16-2019 11:40 PM
@Michael Bronson
Some third party doc reference might give you some idea on that.
https://blogs.msdn.microsoft.com/bigdatasupport/2016/08/15/hdfs-gets-full-in-azure-hdinsight-with-ma...
Created 06-16-2019 11:38 PM
and we have folder under /tmp/hive/hive
so why cli return -
Cannot find any scratch directory to clear
[hdfs@master01 hive]$ hdfs dfs -ls /tmp/hive/hive Found 4 items drwx------ - hive hdfs 0 2019-06-16 21:58 /tmp/hive/hive/2f95f6a5-76ad-487e-968c-1873264a3a9c drwx------ - hive hdfs 0 2019-06-16 21:45 /tmp/hive/hive/368d201c-cedf-48dc-bbad-f13d6aed7016 drwx------ - hive hdfs 0 2019-06-16 21:58 /tmp/hive/hive/717fb013-535b-4279-a12e-4fc4261c4d68 drwx------ - hive hdfs 0 2019-06-16 21:46 /tmp/hive/hive/a58a19fe-2fc1-4b71-82ec-3307de8e2d56
Created 06-16-2019 11:53 PM
@jay - nice
I see there the option:
hadoop fs -rm -r -skipTrash hdfs://mycluster/tmp/hive/hive/
this option will remove all folders under /tmp/hive/hive
but what is the value - mycluster ? ( what I need to replace instead that ?