Member since
01-27-2022
3
Posts
0
Kudos Received
0
Solutions
03-25-2022
05:50 AM
I am using cloudera hdfs sink connector with parquetWriter.
"com.cloudera.dim.kafka.connect.hdfs.HdfsSinkConnector"
It seems to flush the data in kafka topic every 1 minute.
The number of partition of one of my kafka topics is 64.
64 (kafka partition) * 60 (minutes per hour) * 24 (hours per day) = 92160 files are sinked everyday in one directory.
So, I created a job to delete files n days old. But this job is too slow because of the number of files of the directory where the parquet files are sinked by hdfs sink connector.
I have question about cloudera hdfs sink connector
1. Is it possible to sink the files in daily partition directory? ex) /blah/{topicName}/{yyyyMMdd}
2. Is there a way to change flush duration instead of every minutes?
... View more
Labels:
- Labels:
-
Apache Kafka
-
HDFS
02-02-2022
10:02 PM
Thank you!! that works for me.
... View more
01-27-2022
05:47 AM
Hello. I would like to use the kafka connect to connect the kafka brokers provided by cdp and on-premise kafka brokers. To do that, I need to set the configuration value "connector.client.config.override.policy". But I could not find the menu to set custom configuration of kafka connect. Could you let me know how to set the custom configuration of kafka connect especially "connector.client.config.override.policy".?
... View more
Labels:
- Labels:
-
Apache Kafka