Support Questions
Find answers, ask questions, and share your expertise

how to resolve the error about /tmp/hive/hive is exceeded: limit=1048576 items=1048576


hi all

we have ambari cluster ( HDP version - 2.5.4 )

in the spark thrift log we can see the error about - /tmp/hive/hive is exceeded: limit=1048576 items=1048576

we try to delete the old files under /tmp/hive/hive , but there are a million of files and we cant delete them because

hdfs dfs -ls /tmp/hive/hive   

isn't return any output


any suggestion ? how to delete the old files in spite there are a million of files?

or any other solution/?



* for now spark thrift server isn't started successfully because this error , also hiveserver2 not started also

Caused by: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): The directory item limit of /tmp/hive/hive is exceeded: limit=1048576 items=1048576       at org.apache.hadoop.ipc.Server$Han
dler.run(Server.java:2347)


second

can we purge the files? by cron or other?


hdfs dfs -ls /tmp/hive/hive
Found 4 items
drwx------   - hive hdfs          0 2019-06-16 21:58 /tmp/hive/hive/2f95f6a5-76ad-487e-968c-1873264a3a9c
drwx------   - hive hdfs          0 2019-06-16 21:45 /tmp/hive/hive/368d201c-cedf-48dc-bbad-f13d6aed7016
drwx------   - hive hdfs          0 2019-06-16 21:58 /tmp/hive/hive/717fb013-535b-4279-a12e-4fc4261c4d68


Michael-Bronson
31 REPLIES 31

Mentor

@Michael Bronson

Parameter dfs.namenode.fs-limits.max-directory-items determines the maximum number of folders or files (not recursive) in one directory. The value range of this parameter is 1 to 6400000, and the default value is 1048576. Increase the value of parameter dfs.namenode.fs-limits.max-directory-items, and then restart the Ambari so that the new value takes effect.


Workaround

Go to Ambari -> HDFS -> Configs -> Advanced -> Custom hdfs-site and add the key (dfs.namenode.fs-limits.max-directory-items) to i.e double 1048576 to 2097152 you cannot set dfs.namenode.fs-limits.max-directory-items to a value less than 1 or greater than 6400000

After an Ambari restart the config should be pushed to the whole cluster this will allow you to work


HTH

Dear Geoffrey Shelton Okot , since this is production cluster we need to get approve to restart the HDFS and Yarn and map reduce ( as you know this settings require to restart all 3 services )

Michael-Bronson

Super Mentor

@Michael Bronson

As an immediate turnaround may be you would like to double / increase the value set for the following HDFS parameter "dfs.namenode.fs-limits.max-directory-items". It Defines the maximum number of items that a directory may contain. Cannot set the property to a value less than 1 or more than 6400000. (default value : 1048576)

Go to Ambari -> HDFS -> Configs -> Advanced -> Custom hdfs-site and add the key (dfs.namenode.fs-limits.max-directory-item


As per the apache hive docs there seems to be some parameters and tools available to deal with such issue. Although i have not personally tested those tools. But looks like they were introduced to deal with similar issue long back as part of https://issues.apache.org/jira/browse/HIVE-13429


For example i see that the Hive Config "hive.exec.scratchdir" points to the "/tmp/hive" dir.

Can you please check and let us know what is the value set for the following parameter "hive.scratchdir.lock". (if not set then default value will be "false"? Additionally you might want to refer about "hive.server2.clear.dangling.scratchdir" and "hive.start.cleanup.scratchdir" parameters of Hive Server config.


Please refer to [1] the following link to know more about those parameters.

There is a tool "cleardanglingscratchdir" mentioned as part of the link [2] may be you would like to read more about it.

# hive --service cleardanglingscratchdir [-r] [-v] [-s scratchdir]
    -r      dry-run mode, which produces a list on console
    -v      verbose mode, which prints extra debugging information
    -s      if you are using non-standard scratch directory

.

[1] https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hi....

[2] https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-ClearDa...

@dear jay ( hive.server2.clear.dangling.scratchdir and hive.start.cleanup.scratchdir ) are not configured in ambari from HIVE --> CONFIG , do you recommended to add them? , if yes then under advanced on which section we need to add them and what is the value for both parameters?

Michael-Bronson

Super Mentor

@Michael Bronson

You can add those scratchdir parameters inside the in custom hive-site.xml via Ambari UI by clicking the Add Property option.

@dear Jay ok but what are the values that need to set for both parameters?

Michael-Bronson

by the way in ambari hive version is - 1.2.1.2.6

Michael-Bronson

Super Mentor

@Michael Bronson

For now we can enable "hive.server2.clear.dangling.scratchdir=true" for HiveServer2 via Custom hive-site.


Super Mentor

And

hive.start.cleanup.scratchdir=true 

so this settings ( hive.server2.clear.dangling.scratchdir=true ) supported by the hive version - 1.2.1.2.6 ?

Michael-Bronson

Super Mentor

@Michael Bronson

As per this JIRA: https://jira.apache.org/jira/browse/HIVE-15068

This adds "hive.server2.clear.dangling.scratchdir" and "hive.server2.clear.dangling.scratchdir.interval" to HiveConf.java are added from hive 1.3.0 and 2.2.0.

So for safe cleaning of the scratch dir you might want to refer to : https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-Scratch...

# hive --service cleardanglingscratchdir [-r] [-v] [-s scratchdir]


@ear Jay

so finally lets summary

when we set the following


hive.server2.clear.dangling.scratchdir=true
hive.start.cleanup.scratchdir=true 


and then we restart the hive service from ambari


do you think this configuration will be able to delete the old folders under /tmp/hive/hive in spite the folder are a millions folders ?


Michael-Bronson

Super Mentor

@Michael Bronson

As mentioned earlier that the parameters "hive.server2.clear.dangling.scratchdir" and "hive.server2.clear.dangling.scratchdir.interval" to HiveConf.java are added from hive 1.3.0 and 2.2.0.


But as you are using lower version of Hive-1.2.1.2.6 (HDP 2.5) https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.6/bk_release-notes/content/comp_versions.html

Hence those parameters may not take effect because they will be present from "hive 1.3.0 and 2.2.0" version (See: https://jira.apache.org/jira/browse/HIVE-15068) and above. You will have to rely on tools like "cleardanglingscratchdir"

.



@dear Jay , ok our hive version is lower so we need to run the - ( from hive user )

do you think this cli will able to remove all old folders under /tmp/hive/hive ?

 hive --service cleardanglingscratchdir 
Michael-Bronson

Super Mentor

@Michael Bronson
Without testing .. i can not say for sure if something will work or not.

But a this point i believe in the documentation. If something is written in the Doc like following then ideally it should work :https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-Scratch...

Until unless there is a BUG with the tool reported somewhere for that tool.

I do not find any bug reported for that tool ... So i Believe in that tool until i find a BUG ... or If you find a BUG with that tool then please report it.

@Jay when I run it on test lab we get


why ?


[hive@master01 hive]$ hive --service cleardanglingscratchdir
Cannot find any scratch directory to clear
Michael-Bronson

Super Mentor

and we have folder under /tmp/hive/hive

so why cli return -

Cannot find any scratch directory to clear


[hdfs@master01 hive]$ hdfs dfs -ls /tmp/hive/hive
Found 4 items
drwx------   - hive hdfs          0 2019-06-16 21:58 /tmp/hive/hive/2f95f6a5-76ad-487e-968c-1873264a3a9c
drwx------   - hive hdfs          0 2019-06-16 21:45 /tmp/hive/hive/368d201c-cedf-48dc-bbad-f13d6aed7016
drwx------   - hive hdfs          0 2019-06-16 21:58 /tmp/hive/hive/717fb013-535b-4279-a12e-4fc4261c4d68
drwx------   - hive hdfs          0 2019-06-16 21:46 /tmp/hive/hive/a58a19fe-2fc1-4b71-82ec-3307de8e2d56
Michael-Bronson

@jay - nice


I see there the option:


hadoop fs -rm -r -skipTrash hdfs://mycluster/tmp/hive/hive/ 


this option will remove all folders under /tmp/hive/hive


but what is the value - mycluster ? ( what I need to replace instead that ?

Michael-Bronson

Super Mentor

@Michael Bronson

"Mycluster" needs to be replaced with the "fs.defaultFS" parameter of your HDFS config.


; ;