in the spark thrift log we can see the error about - /tmp/hive/hive is exceeded: limit=1048576 items=1048576
we try to delete the old files under /tmp/hive/hive , but there are a million of files and we cant delete them because
hdfs dfs -ls /tmp/hive/hive
isn't return any output
any suggestion ? how to delete the old files in spite there are a million of files?
or any other solution/?
* for now spark thrift server isn't started successfully because this error , also hiveserver2 not started also
Caused by: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): The directory item limit of /tmp/hive/hive is exceeded: limit=1048576 items=1048576 at org.apache.hadoop.ipc.Server$Han
Parameter dfs.namenode.fs-limits.max-directory-items determines the maximum number of folders or files (not recursive) in one directory. The value range of this parameter is 1 to 6400000, and the default value is 1048576. Increase the value of parameter dfs.namenode.fs-limits.max-directory-items, and then restart the Ambari so that the new value takes effect.
Go to Ambari -> HDFS -> Configs -> Advanced -> Custom hdfs-site and add the key (dfs.namenode.fs-limits.max-directory-items) to i.e double 1048576 to 2097152 you cannot set dfs.namenode.fs-limits.max-directory-items to a value less than 1 or greater than 6400000
After an Ambari restart the config should be pushed to the whole cluster this will allow you to work
As an immediate turnaround may be you would like to double / increase the value set for the following HDFS parameter "dfs.namenode.fs-limits.max-directory-items". It Defines the maximum number of items that a directory may contain. Cannot set the property to a value less than 1 or more than 6400000. (default value : 1048576)
Go to Ambari -> HDFS -> Configs -> Advanced -> Custom hdfs-site and add the key (dfs.namenode.fs-limits.max-directory-item
As per the apache hive docs there seems to be some parameters and tools available to deal with such issue. Although i have not personally tested those tools. But looks like they were introduced to deal with similar issue long back as part of https://issues.apache.org/jira/browse/HIVE-13429
For example i see that the Hive Config "hive.exec.scratchdir" points to the "/tmp/hive" dir.
Can you please check and let us know what is the value set for the following parameter "hive.scratchdir.lock". (if not set then default value will be "false"? Additionally you might want to refer about "hive.server2.clear.dangling.scratchdir" and "hive.start.cleanup.scratchdir" parameters of Hive Server config.
Please refer to  the following link to know more about those parameters.
There is a tool "cleardanglingscratchdir" mentioned as part of the link  may be you would like to read more about it.
# hive --service cleardanglingscratchdir [-r] [-v] [-s scratchdir]
-r dry-run mode, which produces a list on console
-v verbose mode, which prints extra debugging information
-s if you are using non-standard scratch directory
@dear jay ( hive.server2.clear.dangling.scratchdir and hive.start.cleanup.scratchdir ) are not configured in ambari from HIVE --> CONFIG , do you recommended to add them? , if yes then under advanced on which section we need to add them and what is the value for both parameters?
Hence those parameters may not take effect because they will be present from "hive 1.3.0 and 2.2.0" version (See: https://jira.apache.org/jira/browse/HIVE-15068) and above. You will have to rely on tools like "cleardanglingscratchdir"