Support Questions

Find answers, ask questions, and share your expertise

CDP7 Streams Messaging Kafka Connect to HDFS



We are using CDP7 Streams Messaging and we are studying the feature Kafka -> Kafka Connect -> HDFS .

From the CDP7 Streams Messaging UI, it seems that the configuration is very limited; and documentation also. So, we have the following questions:
1- Is it possible to configure a schema to use for a storage in HDFS parquet ?
2- Is it possible to tune the partitioning in HDFS parquet (or does it rely on the kafka topic partitioning ?)

Has anyone have examples ?


Master Collaborator



The following connector configuration worked for me. My schema was stored in Schema Registry and the connector fetched it from there.



 "connector.class": "com.cloudera.dim.kafka.connect.hdfs.HdfsSinkConnector",
 "hdfs.output": "/tmp/topics_output/",
 "hdfs.uri": "hdfs://nn1:8020",
 "key.converter": "",
 "name": "asd",
 "output.avro.passthrough.enabled": "true",
 "": "com.cloudera.dim.kafka.connect.hdfs.HdfsPartitionStorage",
 "output.writer": "com.cloudera.dim.kafka.connect.hdfs.parquet.ParquetPartitionWriter",
 "tasks.max": "1",
 "topics": "avro-topic",
 "value.converter": "com.cloudera.dim.kafka.connect.converts.AvroConverter",
 "value.converter.passthrough.enabled": "false",
 "value.converter.schema.registry.url": "http://sr-1:7788/api/v1"





Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.