Created 06-16-2019 08:56 PM
hi all
we have ambari cluster ( HDP version - 2.5.4 )
in the spark thrift log we can see the error about - /tmp/hive/hive is exceeded: limit=1048576 items=1048576
we try to delete the old files under /tmp/hive/hive , but there are a million of files and we cant delete them because
hdfs dfs -ls /tmp/hive/hive
isn't return any output
any suggestion ? how to delete the old files in spite there are a million of files?
or any other solution/?
* for now spark thrift server isn't started successfully because this error , also hiveserver2 not started also
Caused by: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): The directory item limit of /tmp/hive/hive is exceeded: limit=1048576 items=1048576 at org.apache.hadoop.ipc.Server$Han dler.run(Server.java:2347)
second
can we purge the files? by cron or other?
hdfs dfs -ls /tmp/hive/hive Found 4 items drwx------ - hive hdfs 0 2019-06-16 21:58 /tmp/hive/hive/2f95f6a5-76ad-487e-968c-1873264a3a9c drwx------ - hive hdfs 0 2019-06-16 21:45 /tmp/hive/hive/368d201c-cedf-48dc-bbad-f13d6aed7016 drwx------ - hive hdfs 0 2019-06-16 21:58 /tmp/hive/hive/717fb013-535b-4279-a12e-4fc4261c4d68
Created 06-17-2019 12:31 AM
"Mycluster" needs to be replaced with the "fs.defaultFS" parameter of your HDFS config.
Created 06-17-2019 12:34 AM
In case of NameNode enabled cluster the "dfs.nameservices" is defined. so based on the "dfs.nameservices" the "fs.defaultFS" is determined.
For example if "dfs.nameservices=mycluster" then the "fs.defaultFS" will be ideally "hdfs://mycluster"
If there is No NameNode HA enabled then the "fs.defaultFS" will be pointing to NameNode host/port
Created 06-17-2019 04:53 AM
@Jay , in my cluster ( HDFS --> config ) I see dfs.nameservices=hdfsha ,
so it should be like this?
hadoop fs -rm -r -skipTrash hdfs://hdfsha/tmp/hive/hive/
Created 06-17-2019 05:00 AM
@Jay actually it should be like this
hadoop fs -rm -r -skipTrash hdfs://hdfsha/tmp/hive/hive/*
need to add the "*" after slash in order to delete only the folders under /tmp/hive/hive and not the sub folder itself (/tmp/hive/hive)
Created 06-17-2019 05:05 AM
Looks good. Yes in your command mycluster need to be replaced with hdfsha
Created 06-16-2019 11:48 PM
@jay - nice
I see there the option:
hadoop fs -rm -r -skipTrash hdfs://mycluster/tmp/hive/hive/
this option will remove all folders under /tmp/hive/hive
but what is the value - mycluster ? ( what I need to replace instead that ?
Created 06-17-2019 02:43 AM
Whenever you change the parameter this config the cluster needs to be aware of the changes. When you start Ambari the underlying components don't get started unless you explicitly start those components!
So you can start Ambari without stating YARN or HDFS
Created 06-17-2019 05:11 AM
@Geoffrey Shelton Okot - do you mean to restart the ambari server ? as ambari server restart? instead to restart the HDFS and YARN services ? ( after we set dfs.namenode.fs-limits.max-directory-items )
Created 06-17-2019 05:21 AM
the parameter "dfs.namenode.fs-limits.max-directory-items " is HDFS specific hence the & HDFS dependent services and HDFS Dependent service components needs to be restarted. In Ambari UI it will show the required service components that needs to be restarted.
No need to restart Ambari Server.
Created on 06-17-2019 05:22 AM - edited 08-17-2019 02:54 PM
second when I saved the parameter in ambari - Ambari -> HDFS -> Configs -> Advanced -> Custom hdfs-site
dfs.namenode.fs-limits.max-directory-items=2097152
I GET:
The configuration changes could not be validated for consistency due to an unknown error. Your changes have not been saved yet. Would you like to proceed and save the changes?
dose this parameter supported in HDP version - 2.6.4?