- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
how to push data (csv/xsl ) in kafka ?
- Labels:
-
Apache Hadoop
-
Apache Kafka
Created ‎06-08-2018 01:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to push data into HDFS through Kakfa. So I not getting how to first get data into kafka, if the data format is csv/xsl what should be the procedure to get that data into kafka and further push it in HDFS ?
Created ‎06-09-2018 06:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Rahul Kumar!
Oh i see, so in this case you can use telegraf to take the data from InfluxDB -> Kafka.
https://github.com/influxdata/telegraf
https://www.influxdata.com/blog/using-telegraf-to-send-metrics-to-influxdb-and-kafka/
And from Kafka -> HDFS, you can use kafka connect.
https://kafka.apache.org/documentation/#connect
https://docs.confluent.io/current/connect/connect-hdfs/docs/hdfs_connector.htm
Created ‎06-08-2018 04:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey @Rahul Kumar!
First you will need to create a kafka topic and then you've a few options to insert data into a kafka topic using a kafka producer.
- Using other tools to put data directly into kafka: E.g. Nifi, Kafka Connect, Spark, Storm, Flume and so on.
- Programming a kafka producer: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_kafka-component-guide/content/ch_kafka-d...
And after kafka receives your data, you can consume the data using a kafka consumer and putting into HDFS. And to create a kafka consumer, the same options as above.
- Using other tools to put data directly into kafka: E.g. Nifi, Kafka Connect, Spark, Storm, Flume and so on.
- Programming a kafka consumer: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_kafka-component-guide/content/ch_kafka-d...
Here's an example doing this manually:
[root@node1 ~]# kafka-topics.sh --zookeeper $ZKENSEMBLE --create --topic vin-hcc-nifi --partitions 3 --replication-factor 3Created topic "vin-hcc-nifi".
[root@node1 ~]# kafka-console-producer.sh --broker-list node1:6667 --topic vin-hcc-nifi
>Im brazilian and im testing this topic
>please work!
>HCC
[root@node2 ~]# kafka-console-consumer.sh --bootstrap-server node2:6667 --topic vin-hcc-nifi --from-beginning
Im brazilian and im testing this topic
please work!
Hope this helps!
Created ‎06-09-2018 06:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I understood the concept of `Producer` and `Consumer` done by you manually above and also by referring to Step 3,4 and 5 in https://kafka.apache.org/quickstart
But referring to
Do I need to put codes of `Producer` and `Consumer` from this link to a `gedit` file, so that data can be send from Kafka to HDFS ??
Created ‎06-09-2018 06:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey @Rahul Kumar!
It's up to you, actually 🙂
If you prefer to program a producer/consumer, you can use the mentioned link as an example.
But in my humble opinion, will be much faster/easier to take this csv/xsl files and throw to kafka and put into HDFS using Nifi. Cause you don't need to worry about code, just need to drag'n drop processors and fill some parameters to construct a nifi flow.
E.g.: https://hortonworks.com/tutorial/nifi-in-trucking-iot-on-hdf/section/3/
One question where these files are coming from?
I'm asking this, cause if your files are coming from a DB, you have kafka connect, it can get/throw data using JDBC from/to kafka. Or you can use flume/spark/storm if you are looking for fast delivery.
Hope this helps!
Created ‎06-09-2018 10:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Vinicius Higa Murakami
Actually The data is in influxdb ...so how the data can come from influxdb to kafka and then finally into HDFS ?
Created ‎06-09-2018 06:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Rahul Kumar!
Oh i see, so in this case you can use telegraf to take the data from InfluxDB -> Kafka.
https://github.com/influxdata/telegraf
https://www.influxdata.com/blog/using-telegraf-to-send-metrics-to-influxdb-and-kafka/
And from Kafka -> HDFS, you can use kafka connect.
https://kafka.apache.org/documentation/#connect
https://docs.confluent.io/current/connect/connect-hdfs/docs/hdfs_connector.htm
Created ‎06-11-2018 04:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Okay... I will work upon it. I will let you know the progess 🙂
Created ‎06-11-2018 01:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are interested using Confluence Kafka, there are so many plugin (Source & Sink). here is the plugin to push the data to HDFS - https://www.confluent.io/connector/kafka-connect-hdfs/
But to use these plugins it is better to use the kafa bundle from confluence site itself - https://www.confluent.io/download/
To the actual question "Loading the data from csv / excel to Kafka topic" -
Here is the commands that can help you..
Note: Try Both starting Consumer before starting Producer or use the option in consumer "--from-beginning"
From Producer:
/usr/hdp/2.4.2.0-258/kafka/bin$ cat /tmp/a.txt
Am,sending, this , information,for,testing,purpose
This,is,for,helping,others
/usr/hdp/2.4.2.0-258/kafka/bin$ ./kafka-console-producer.sh --topic test --broker-list c3des330:6667 < /tmp/a.txt
From Consumer:
/usr/hdp/2.4.2.0-258/kafka/bin$ ./kafka-console-consumer.sh --zookeeper localhost:2181 --topic test {metadata.broker.list=c3des266:6667,c3des330:6667, request.timeout.ms=30000, client.id=console-consumer-64407, security.protocol=PLAINTEXT}
Am,sending, this , information,for,testing,purpose
This,is,for,helping,others
Hope this helps.
Created ‎06-14-2018 05:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using normal kafka in ubuntu. I created /home/kafka/bin/tmp/a.txt
Then tried to produce it as
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test < /tmp/a.txt
But it gives me same error as
bash: /tmp/a.txt: No such file or directory
Created ‎06-11-2018 02:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
