I am using cloudera hdfs sink connector with parquetWriter.
"com.cloudera.dim.kafka.connect.hdfs.HdfsSinkConnector"
It seems to flush the data in kafka topic every 1 minute.
The number of partition of one of my kafka topics is 64.
64 (kafka partition) * 60 (minutes per hour) * 24 (hours per day) = 92160 files are sinked everyday in one directory.
So, I created a job to delete files n days old. But this job is too slow because of the number of files of the directory where the parquet files are sinked by hdfs sink connector.
I have question about cloudera hdfs sink connector
1. Is it possible to sink the files in daily partition directory? ex) /blah/{topicName}/{yyyyMMdd}
2. Is there a way to change flush duration instead of every minutes?