Folks: We are in the process of migrating away from Confluent Kafka to HDF Kafka. One of the key features we require to land the Kafka data into HDFS. All data in Kafka is in the Avro format, and we are planning to use the HDF Schema Registry to manage the Avro schemas. While working with getting the Connect environment up, we have noticed a few things. As far as we can tell, no additional connectors are included in the HDF distribution (like, say, one that could write to HDP). Odd, but that's ok, we'll go with the open source ones The Apache JSON connector included works as expected in standalone and distributed mode Both the HDFS and JDBC open source connectors from Confluent work as expected with JSON documents in standalone and distributed mode When we add Confluent's open source Avro converter, we get this error message: [2019-01-15 00:34:33,705] INFO Kafka version : 22.214.171.124.3.0.0-165 (org.apache.kafka.common.utils.AppInfoParser:109)[2019-01-15 00:34:33,705] INFO Kafka commitId : bd037de41b621a69 (org.apache.kafka.common.utils.AppInfoParser:110)[2019-01-15 00:34:33,848] INFO Kafka cluster ID: IH7zs9ZPQoaoCeNcBQZQxw (org.apache.kafka.connect.util.ConnectUtils:59) [2019-01-15 00:34:33,862] INFO Logging initialized @7446ms to org.eclipse.jetty.util.log.Slf4jLog (org.eclipse.jetty.util.log:193) [2019-01-15 00:34:33,898] INFO Added connector for http://:8083 (org.apache.kafka.connect.runtime.rest.RestServer:119) [2019-01-15 00:34:33,917] INFO Advertised URI: http://IP-ADDRESS:8083/ (org.apache.kafka.connect.runtime.rest.RestServer:267) [2019-01-15 00:34:33,926] INFO Kafka version : 126.96.36.199.3.0.0-165 (org.apache.kafka.common.utils.AppInfoParser:109) [2019-01-15 00:34:33,926] INFO Kafka commitId : bd037de41b621a69 (org.apache.kafka.common.utils.AppInfoParser:110) [2019-01-15 00:34:33,936] ERROR Stopping due to error (org.apache.kafka.connect.cli.ConnectDistributed:117)io.confluent.common.config.ConfigException: Missing required configuration "schema.registry.url" which has no default value. at io.confluent.common.config.ConfigDef.parse(ConfigDef.java:251) at io.confluent.common.config.AbstractConfig.<init>(AbstractConfig.java:78) at io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig.<init>(AbstractKafkaAvroSerDeConfig.java:105) at io.confluent.connect.avro.AvroConverterConfig.<init>(AvroConverterConfig.java:27) at io.confluent.connect.avro.AvroConverter.configure(AvroConverter.java:60) at org.apache.kafka.connect.runtime.isolation.Plugins.newConverter(Plugins.java:266) at org.apache.kafka.connect.runtime.Worker.<init>(Worker.java:115) at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:88) The schema.registry.url is in the connect-distributed.properties file for the connector and works without incident so long as the internal.key.converter/internal.value.converter values are set to JsonConverter rather than AvroConverter. As soon as we comment out the Avro converter lines, everything goes back to the expected behaviour. I suspect that the problem is somewhere in the Confluent configuration, as the fifth INFO statement indicates adding a connector for a NULL server (in italics above), but I'm posting this in case other people have seen similar issues while we're working through the opaque Confluent configuration documentation.
... View more
This may be a noob questions, so please forgive, but I am having challenges finding this in the documentation. I am aware that HDP can work with S3 buckets on AWS. However, we have data that cannot leave the data center, and Hadoop is the right tool to solve the problem. Can HDP work with data on a local SAN if that data is presented through an S3 interface?
... View more