Looking for assistance for Nifi-hive process.
I am pushing data on HDFS (version 2.7.3) using NIFI putHDFS (version1.2.0). So that i can access hdfs data on hive using external table.
I am getting following error :
Failed to write to HDFS due to org.apache.nifi.processor.exception.
ProcessException: IOException thrown from PutHDFS[id=6a3e3aa1-ed4a-19b3-bfd8-75796c673298]:
The directory item limit of /apps/hive/warehouse/hdf_stg_table_nifi is exceeded: limit=1048576 items=1048576
I checked few topics on the same & did following changes, but did not help yet to resolve issue.
hive.tez.dynamic.partition.pruning.max.event.size,changed from 1048576 to 2097152
hive.vectorized.groupby.checkinterval, changed from 4096 8192
HiveServer Interactive Heap Size, changed from 512 to 1024
hive.tez.dynamic.partition.pruning.max.data.size 104857600 to 209715200
<property> <name>dfs.namenode.fs-limits.max-directory-items</name> <value>4194304</value> </property>
You have reached the max number of files for one folder, and an ls on this folder may not work
Maybe your process is creating too many small files, maybe worth check why this is happening.
For a quick workaround you can try the following :
1#get the total count of the table 2#get the creations script, make sure the table is partitioned accordingly. 3#take a copy of the table create table tablecopy as select * from table; 4#check the count on the new table select count(*) from table 5#check number of hdfs files hdfs dfs -ls /apps/hive/warehouse//table 6#take a copy of the hdfs folder for further investigation export HADOOP_HEAPSIZE="8096" hdfs dfs -cp /apps/hive/warehouse//table /tmp => you may have OutOfMemoryError: GC overhead limit exceeded 7#truncate original table truncate table table; 8#drop table drop table table; 9#make sure hdfs folder is removed 10#create table again 11#put the data back with insert