Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

How to implement kafka-connect-hdfs with hdp provided kafka

I have a scenario where my kafka cluster will be getting events from remote kafka cluster(source) and I want events to be pushed to hadoop hdfs (sink). I have searched through google and found out that we need to use kafka-connect-hdfs connector which will send messages or events from kafka to hdfs directly. Now iam not too sure how to get this connector and work with HDP or how do you think we can get around this situation effectively

5 REPLIES 5

New Contributor

Ranjit Nagi - Do you have any working implementation for the above scenario? If so, how is the implementation look like?

Super Collaborator

From a non-Hadoop machine, install Java+Maven+Git

git clone https://github.com/confluentinc/kafka-connect-hdfs
cd kafka-connect-hdfs
git fetch --all --tags --prune
git checkout tags/v4.1.2  # This is a Confluent Release number, which corresponds to a Kafka release number
mvn clean install -DskipTests

This should generate some files under the target folder in that directory.

So, using the 4.1.2 example, I would ZIP up the "target/kafka-connect-hdfs-4.1.2-package/share/java/" folder that was built, then copy this file and extract it into all HDP servers that I want to run Kafka Connect on. For example, /opt/kafka-connect-hdfs/share/java

From there, you would find your "connect-distributed.properties" file and add a line for

plugin.path=/opt/kafka-connect-hdfs/share/java

Now, run something like this (I don't know the full location of the property files)

connect-distributed /usr/hdp/current/kafka/.../connect-distributed.properties

Once that starts, you can attempt to hit http://connect-server:8083/connector-plugins , and you should see an item for "io.confluent.connect.hdfs.HdfsSinkConnector"

If successful, continue to read the HDFS Connector documentation, then POST the JSON configuration body to the Connect Server endpoint. (or use Landoop's Connect UI tool)

New Contributor

Will this work for Apache Kafka in an Hortonworks cluster

Super Collaborator

@Vamshi Reddy

Yes, "Confluent" is not some custom version of Apache Kafka

In fact, this process is very repeatable for all other Kafka Connect plugins.

  1. Download the code
  2. Build it against the Kafka version you run
  3. Move the package to the Connect server
  4. Extract the JAR files onto the Connect server CLASSPATH
  5. Run/Restart Connect

New Contributor

@JordanMoore I am getting this error

 

java.lang.ClassNotFoundException: io.confluent.connect.storage.StorageSinkConnectorConfig

 

when trying to add connector using REST api. I am following this documentation.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.