Support Questions

dida · ‎07-05-2022

Hello,

We are using CDP7 Streams Messaging and we are studying the feature Kafka -> Kafka Connect -> HDFS .

From the CDP7 Streams Messaging UI, it seems that the configuration is very limited; and documentation also. So, we have the following questions:
1- Is it possible to configure a schema to use for a storage in HDFS parquet ?
2- Is it possible to tune the partitioning in HDFS parquet (or does it rely on the kafka topic partitioning ?)

Has anyone have examples ?

araujo · ‎07-05-2022

@dida

The following connector configuration worked for me. My schema was stored in Schema Registry and the connector fetched it from there.

{
 "connector.class": "com.cloudera.dim.kafka.connect.hdfs.HdfsSinkConnector",
 "hdfs.output": "/tmp/topics_output/",
 "hdfs.uri": "hdfs://nn1:8020",
 "key.converter": "org.apache.kafka.connect.storage.StringConverter",
 "name": "asd",
 "output.avro.passthrough.enabled": "true",
 "output.storage": "com.cloudera.dim.kafka.connect.hdfs.HdfsPartitionStorage",
 "output.writer": "com.cloudera.dim.kafka.connect.hdfs.parquet.ParquetPartitionWriter",
 "tasks.max": "1",
 "topics": "avro-topic",
 "value.converter": "com.cloudera.dim.kafka.connect.converts.AvroConverter",
 "value.converter.passthrough.enabled": "false",
 "value.converter.schema.registry.url": "http://sr-1:7788/api/v1"
}

Cheers,

André

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Cloudera Community

Support Questions

CDP7 Streams Messaging Kafka Connect to HDFS