Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

CDP7 Streams Messaging Kafka Connect to HDFS



We are using CDP7 Streams Messaging and we are studying the feature Kafka -> Kafka Connect -> HDFS .

From the CDP7 Streams Messaging UI, it seems that the configuration is very limited; and documentation also. So, we have the following questions:
1- Is it possible to configure a schema to use for a storage in HDFS parquet ?
2- Is it possible to tune the partitioning in HDFS parquet (or does it rely on the kafka topic partitioning ?)

Has anyone have examples ?


Master Collaborator



The following connector configuration worked for me. My schema was stored in Schema Registry and the connector fetched it from there.



 "connector.class": "com.cloudera.dim.kafka.connect.hdfs.HdfsSinkConnector",
 "hdfs.output": "/tmp/topics_output/",
 "hdfs.uri": "hdfs://nn1:8020",
 "key.converter": "",
 "name": "asd",
 "output.avro.passthrough.enabled": "true",
 "": "com.cloudera.dim.kafka.connect.hdfs.HdfsPartitionStorage",
 "output.writer": "com.cloudera.dim.kafka.connect.hdfs.parquet.ParquetPartitionWriter",
 "tasks.max": "1",
 "topics": "avro-topic",
 "value.converter": "com.cloudera.dim.kafka.connect.converts.AvroConverter",
 "value.converter.passthrough.enabled": "false",
 "value.converter.schema.registry.url": "http://sr-1:7788/api/v1"





Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.