Created 09-27-2017 01:12 AM
my project need to stream data from kafka to mongodb, so we did setup HDP cluster and kafka multiple nodes within.
After all, I feel we may process all data by using kafka/zookeeper alone, no need to have cluster. any one can tell if kafka stream can work itself without in a cluster?
if yes, anythine I need to aware?
Created 09-27-2017 04:37 AM
Hi @Robin Dong
Yes, you definitely can install Kafka itself(yes, also need zookeeper) as a cluster. You can check this Kafka Multi Broker Doc as a reference.
As for HDP cluster, I think you have some misunderstanding of Hortonworks Data Platform(HDP) cluster.
The Kafka is already a cluster. And Zookeeper also works as a cluster.
The HDP is a Hadoop Distribution, and it use Ambari to help you manage the different components in your cluster in a single page.
And HDP can be highly costumed, you can only install Kafka and Zookeeper when you install the cluster. It's very convenient when HDP use Ambari to install those components.
So indeed you can install the Kafka and Zookeeper manually, I suggest you install them with HDP, because it quite easy and it can automatically help you integrate Kafka and Zookeeper together. And with ambari view, you can see many different metrics of Kafka and Zookeeper which can help you to check the health of your cluster.
If you more emphasis on the Data Stream. I suggest you to try Hortonworks Data Flow(HDF) . Because the main components in HDF is Kafka/Storm/Zookeeper/NiFi. And also you can tailor HDF by yourself.
Cheers,
Created 09-27-2017 04:37 AM
Hi @Robin Dong
Yes, you definitely can install Kafka itself(yes, also need zookeeper) as a cluster. You can check this Kafka Multi Broker Doc as a reference.
As for HDP cluster, I think you have some misunderstanding of Hortonworks Data Platform(HDP) cluster.
The Kafka is already a cluster. And Zookeeper also works as a cluster.
The HDP is a Hadoop Distribution, and it use Ambari to help you manage the different components in your cluster in a single page.
And HDP can be highly costumed, you can only install Kafka and Zookeeper when you install the cluster. It's very convenient when HDP use Ambari to install those components.
So indeed you can install the Kafka and Zookeeper manually, I suggest you install them with HDP, because it quite easy and it can automatically help you integrate Kafka and Zookeeper together. And with ambari view, you can see many different metrics of Kafka and Zookeeper which can help you to check the health of your cluster.
If you more emphasis on the Data Stream. I suggest you to try Hortonworks Data Flow(HDF) . Because the main components in HDF is Kafka/Storm/Zookeeper/NiFi. And also you can tailor HDF by yourself.
Cheers,
Created 09-27-2017 08:15 PM
Thank you very much Wang for confirm the zk and kf and work alone.
yes, with HDP, kafka and zookeeper is better administrated and monitored. I did setup setup kafka cluster and mongodb with HDP, it seemed very easy steps.
However, I tried to save some money for our company, so I came up with this question. thank you for confirm it.
Created 09-28-2017 10:36 AM
Created 09-28-2017 11:05 AM
yes, you are right HDP is license free. however the cluster install needs master and slave infrastructure, so all the HDFS and data name, name node, yarn alone with some mandatory like Hive, pig, tez need installed and maintained.
All of these cost a lots on AWS/EC2.
in other hand, without HDP cluster, the monitoring, upgrade, version compatiblility and security on kafka, spark, zookeeper may bring issues in long run. admin have to deal with it.
I am looking for use case of zookeeper/kafka cluster and spark to compare to see if it is worth to do so.
How do you think?
Created 09-28-2017 02:06 PM
That why I suggest you to use HDF, which you can only install zookeeper and kafka.
Created 09-28-2017 03:02 PM
Thank you so much.