CDP7 Streams Messaging Kafka Connect to HDFS



We are using CDP7 Streams Messaging and we are studying the feature Kafka -> Kafka Connect -> HDFS .

From the CDP7 Streams Messaging UI, it seems that the configuration is very limited; and documentation also. So, we have the following questions:
1- Is it possible to configure a schema to use for a storage in HDFS parquet ?
2- Is it possible to tune the partitioning in HDFS parquet (or does it rely on the kafka topic partitioning ?)

Has anyone have examples ?


The following connector configuration worked for me. My schema was stored in Schema Registry and the connector fetched it from there.



 "connector.class": "com.cloudera.dim.kafka.connect.hdfs.HdfsSinkConnector",
 "hdfs.output": "/tmp/topics_output/",
 "hdfs.uri": "hdfs://nn1:8020",
 "key.converter": "",
 "name": "asd",
 "output.avro.passthrough.enabled": "true",
 "": "com.cloudera.dim.kafka.connect.hdfs.HdfsPartitionStorage",
 "output.writer": "com.cloudera.dim.kafka.connect.hdfs.parquet.ParquetPartitionWriter",
 "tasks.max": "1",
 "topics": "avro-topic",
 "value.converter": "com.cloudera.dim.kafka.connect.converts.AvroConverter",
 "value.converter.passthrough.enabled": "false",
 "value.converter.schema.registry.url": "http://sr-1:7788/api/v1"





