Created 06-08-2018 01:05 PM
I want to push data into HDFS through Kakfa. So I not getting how to first get data into kafka, if the data format is csv/xsl what should be the procedure to get that data into kafka and further push it in HDFS ?
Created 06-09-2018 06:13 PM
Hi @Rahul Kumar!
Oh i see, so in this case you can use telegraf to take the data from InfluxDB -> Kafka.
https://github.com/influxdata/telegraf
https://www.influxdata.com/blog/using-telegraf-to-send-metrics-to-influxdb-and-kafka/
And from Kafka -> HDFS, you can use kafka connect.
https://kafka.apache.org/documentation/#connect
https://docs.confluent.io/current/connect/connect-hdfs/docs/hdfs_connector.htm
Created 06-08-2018 04:59 PM
Hey @Rahul Kumar!
First you will need to create a kafka topic and then you've a few options to insert data into a kafka topic using a kafka producer.
- Using other tools to put data directly into kafka: E.g. Nifi, Kafka Connect, Spark, Storm, Flume and so on.
- Programming a kafka producer: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_kafka-component-guide/content/ch_kafka-d...
And after kafka receives your data, you can consume the data using a kafka consumer and putting into HDFS. And to create a kafka consumer, the same options as above.
- Using other tools to put data directly into kafka: E.g. Nifi, Kafka Connect, Spark, Storm, Flume and so on.
- Programming a kafka consumer: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_kafka-component-guide/content/ch_kafka-d...
Here's an example doing this manually:
[root@node1 ~]# kafka-topics.sh --zookeeper $ZKENSEMBLE --create --topic vin-hcc-nifi --partitions 3 --replication-factor 3Created topic "vin-hcc-nifi".
[root@node1 ~]# kafka-console-producer.sh --broker-list node1:6667 --topic vin-hcc-nifi
>Im brazilian and im testing this topic
>please work!
>HCC
[root@node2 ~]# kafka-console-consumer.sh --bootstrap-server node2:6667 --topic vin-hcc-nifi --from-beginning
Im brazilian and im testing this topic
please work!
Hope this helps!
Created 06-09-2018 06:35 AM
I understood the concept of `Producer` and `Consumer` done by you manually above and also by referring to Step 3,4 and 5 in https://kafka.apache.org/quickstart
But referring to
Do I need to put codes of `Producer` and `Consumer` from this link to a `gedit` file, so that data can be send from Kafka to HDFS ??
Created 06-09-2018 06:58 AM
Hey @Rahul Kumar!
It's up to you, actually 🙂
If you prefer to program a producer/consumer, you can use the mentioned link as an example.
But in my humble opinion, will be much faster/easier to take this csv/xsl files and throw to kafka and put into HDFS using Nifi. Cause you don't need to worry about code, just need to drag'n drop processors and fill some parameters to construct a nifi flow.
E.g.: https://hortonworks.com/tutorial/nifi-in-trucking-iot-on-hdf/section/3/
One question where these files are coming from?
I'm asking this, cause if your files are coming from a DB, you have kafka connect, it can get/throw data using JDBC from/to kafka. Or you can use flume/spark/storm if you are looking for fast delivery.
Hope this helps!
Created 06-09-2018 10:16 AM
@Vinicius Higa Murakami
Actually The data is in influxdb ...so how the data can come from influxdb to kafka and then finally into HDFS ?
Created 06-09-2018 06:13 PM
Hi @Rahul Kumar!
Oh i see, so in this case you can use telegraf to take the data from InfluxDB -> Kafka.
https://github.com/influxdata/telegraf
https://www.influxdata.com/blog/using-telegraf-to-send-metrics-to-influxdb-and-kafka/
And from Kafka -> HDFS, you can use kafka connect.
https://kafka.apache.org/documentation/#connect
https://docs.confluent.io/current/connect/connect-hdfs/docs/hdfs_connector.htm
Created 06-11-2018 04:05 AM
Okay... I will work upon it. I will let you know the progess 🙂
Created 06-11-2018 01:22 PM
If you are interested using Confluence Kafka, there are so many plugin (Source & Sink). here is the plugin to push the data to HDFS - https://www.confluent.io/connector/kafka-connect-hdfs/
But to use these plugins it is better to use the kafa bundle from confluence site itself - https://www.confluent.io/download/
To the actual question "Loading the data from csv / excel to Kafka topic" -
Here is the commands that can help you..
Note: Try Both starting Consumer before starting Producer or use the option in consumer "--from-beginning"
From Producer:
/usr/hdp/2.4.2.0-258/kafka/bin$ cat /tmp/a.txt
Am,sending, this , information,for,testing,purpose
This,is,for,helping,others
/usr/hdp/2.4.2.0-258/kafka/bin$ ./kafka-console-producer.sh --topic test --broker-list c3des330:6667 < /tmp/a.txt
From Consumer:
/usr/hdp/2.4.2.0-258/kafka/bin$ ./kafka-console-consumer.sh --zookeeper localhost:2181 --topic test {metadata.broker.list=c3des266:6667,c3des330:6667, request.timeout.ms=30000, client.id=console-consumer-64407, security.protocol=PLAINTEXT}
Am,sending, this , information,for,testing,purpose
This,is,for,helping,others
Hope this helps.
Created 06-14-2018 05:09 AM
I am using normal kafka in ubuntu. I created /home/kafka/bin/tmp/a.txt
Then tried to produce it as
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test < /tmp/a.txt
But it gives me same error as
bash: /tmp/a.txt: No such file or directory
Created 06-11-2018 02:35 PM