Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to push data (csv/xsl ) in kafka ?

avatar
Contributor

I want to push data into HDFS through Kakfa. So I not getting how to first get data into kafka, if the data format is csv/xsl what should be the procedure to get that data into kafka and further push it in HDFS ?

1 ACCEPTED SOLUTION
9 REPLIES 9

avatar

Hey @Rahul Kumar!
First you will need to create a kafka topic and then you've a few options to insert data into a kafka topic using a kafka producer.

- Using other tools to put data directly into kafka: E.g. Nifi, Kafka Connect, Spark, Storm, Flume and so on.

- Programming a kafka producer: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_kafka-component-guide/content/ch_kafka-d...

And after kafka receives your data, you can consume the data using a kafka consumer and putting into HDFS. And to create a kafka consumer, the same options as above.

- Using other tools to put data directly into kafka: E.g. Nifi, Kafka Connect, Spark, Storm, Flume and so on.

- Programming a kafka consumer: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_kafka-component-guide/content/ch_kafka-d...

Here's an example doing this manually:

[root@node1 ~]# kafka-topics.sh --zookeeper $ZKENSEMBLE --create --topic vin-hcc-nifi --partitions 3 --replication-factor 3Created topic "vin-hcc-nifi".
[root@node1 ~]# kafka-console-producer.sh --broker-list node1:6667 --topic vin-hcc-nifi
>Im brazilian and im testing this topic
>please work!
>HCC
[root@node2 ~]# kafka-console-consumer.sh --bootstrap-server node2:6667 --topic vin-hcc-nifi --from-beginning
Im brazilian and im testing this topic
please work!

Hope this helps!

avatar
Contributor

I understood the concept of `Producer` and `Consumer` done by you manually above and also by referring to Step 3,4 and 5 in https://kafka.apache.org/quickstart

But referring to

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_kafka-component-guide/content/ch_kafka-d...

Do I need to put codes of `Producer` and `Consumer` from this link to a `gedit` file, so that data can be send from Kafka to HDFS ??

avatar

Hey @Rahul Kumar!
It's up to you, actually 🙂
If you prefer to program a producer/consumer, you can use the mentioned link as an example.
But in my humble opinion, will be much faster/easier to take this csv/xsl files and throw to kafka and put into HDFS using Nifi. Cause you don't need to worry about code, just need to drag'n drop processors and fill some parameters to construct a nifi flow.
E.g.: https://hortonworks.com/tutorial/nifi-in-trucking-iot-on-hdf/section/3/

One question where these files are coming from?
I'm asking this, cause if your files are coming from a DB, you have kafka connect, it can get/throw data using JDBC from/to kafka. Or you can use flume/spark/storm if you are looking for fast delivery.

Hope this helps!

avatar
Contributor

@Vinicius Higa Murakami
Actually The data is in influxdb ...so how the data can come from influxdb to kafka and then finally into HDFS ?

avatar

avatar
Contributor

Okay... I will work upon it. I will let you know the progess 🙂

avatar
New Contributor

If you are interested using Confluence Kafka, there are so many plugin (Source & Sink). here is the plugin to push the data to HDFS - https://www.confluent.io/connector/kafka-connect-hdfs/

But to use these plugins it is better to use the kafa bundle from confluence site itself - https://www.confluent.io/download/

To the actual question "Loading the data from csv / excel to Kafka topic" -

Here is the commands that can help you..

Note: Try Both starting Consumer before starting Producer or use the option in consumer "--from-beginning"

From Producer:

/usr/hdp/2.4.2.0-258/kafka/bin$ cat /tmp/a.txt

Am,sending, this , information,for,testing,purpose

This,is,for,helping,others

/usr/hdp/2.4.2.0-258/kafka/bin$ ./kafka-console-producer.sh --topic test --broker-list c3des330:6667 < /tmp/a.txt

From Consumer:

/usr/hdp/2.4.2.0-258/kafka/bin$ ./kafka-console-consumer.sh --zookeeper localhost:2181 --topic test {metadata.broker.list=c3des266:6667,c3des330:6667, request.timeout.ms=30000, client.id=console-consumer-64407, security.protocol=PLAINTEXT}

Am,sending, this , information,for,testing,purpose

This,is,for,helping,others

Hope this helps.

avatar
Contributor

I am using normal kafka in ubuntu. I created /home/kafka/bin/tmp/a.txt

Then tried to produce it as

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test < /tmp/a.txt

But it gives me same error as

bash: /tmp/a.txt: No such file or directory