Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

CDP7 Streams Messaging Kafka Connect to HDFS

avatar
Explorer

Hello,

We are using CDP7 Streams Messaging and we are studying the feature Kafka -> Kafka Connect -> HDFS .


From the CDP7 Streams Messaging UI, it seems that the configuration is very limited; and documentation also. So, we have the following questions:
1- Is it possible to configure a schema to use for a storage in HDFS parquet ?
2- Is it possible to tune the partitioning in HDFS parquet (or does it rely on the kafka topic partitioning ?)

Has anyone have examples ?

1 REPLY 1

avatar
Super Guru

@dida 

 

The following connector configuration worked for me. My schema was stored in Schema Registry and the connector fetched it from there.

 

 

{
 "connector.class": "com.cloudera.dim.kafka.connect.hdfs.HdfsSinkConnector",
 "hdfs.output": "/tmp/topics_output/",
 "hdfs.uri": "hdfs://nn1:8020",
 "key.converter": "org.apache.kafka.connect.storage.StringConverter",
 "name": "asd",
 "output.avro.passthrough.enabled": "true",
 "output.storage": "com.cloudera.dim.kafka.connect.hdfs.HdfsPartitionStorage",
 "output.writer": "com.cloudera.dim.kafka.connect.hdfs.parquet.ParquetPartitionWriter",
 "tasks.max": "1",
 "topics": "avro-topic",
 "value.converter": "com.cloudera.dim.kafka.connect.converts.AvroConverter",
 "value.converter.passthrough.enabled": "false",
 "value.converter.schema.registry.url": "http://sr-1:7788/api/v1"
}

 

Cheers,

André

 

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.