Member since
11-19-2015
158
Posts
25
Kudos Received
21
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
11897 | 09-01-2018 01:27 AM | |
1143 | 09-01-2018 01:18 AM | |
3798 | 08-20-2018 09:39 PM | |
509 | 07-20-2018 04:51 PM | |
1521 | 07-16-2018 09:41 PM |
07-25-2018
06:58 PM
1 Kudo
There is no such rule for Kafka Brokers. Zookeeper should maintain a quorum or (n/2 + 1) total machines (of n) that agree on leader-election values and locks, that results in a total odd number to accommodate for hardware and network failure scenarios. From "Kafka - The Definitive Guide", as well as Apache Zookeeper site, you generally will have negative side effects from having more than 5 or 7 Zookeeper servers total serving applications using it. You should have more than 3 Zookeepers because if one goes down, you are only left with 2, which results in that "split brain". With 5 servers, two can go down, and you still have 2 servers + 1 available for the "tie breaker" vote. For 7, you can loose up to 4 zookeepers and still be good.
... View more
07-24-2018
06:26 PM
Contrary to answer by @Harshali Patel, exhaustion is not defined as an uneven distribution, it is rather a cause of it. A datanode has a property that you can set which defines a threshold of data must be reserved for the OS on that server. Once that limit is exceeded, the datanode process will stop and log an error telling you to delete some files from it. HDFS will continue to function with the other datanodes. The balancer can be ran to keep storage space healthy and even.
... View more
07-20-2018
04:52 PM
TaskTracker & JobTracker doesn't exist with YARN. The default replication factor is 3.
... View more
07-20-2018
04:51 PM
1 Kudo
What component are you asking about? What are you trying to achieve? They typically call each other over combinations of separate protocols. - HDFS and YARN interact via RPC/IPC. - Ambari Server and Agents are over HTTP & REST. Ambari also needs JDBC connections to the backing database. - Hive, Hbase, and Spark can use Thrift Server. The Hive metastore uses JDBC. - Kafka has its own TCP protocol. I would suggest starting on a specific component for the use case(s) you want. Hadoop itself is only comprised of HDFS & YARN + MapReduce
... View more
07-16-2018
09:41 PM
1 Kudo
@Sambasivam Subramanian
By definition, an edge node is just a host only with clients installed and configured. If you install no server services in Ambari for a host, then you will end up with an edge node for the clients that you selected.
... View more
06-20-2018
06:55 PM
Please see previous question - https://community.hortonworks.com/questions/167618/how-to-specify-more-than-one-path-for-the-storage.html
... View more
05-19-2018
05:13 AM
The configs are on the top line. It will say "Configs: " if none are customized $ kafka-topics --describe --topic $TOPIC --zookeeper $ZOOKEEPER
Topic:******** PartitionCount:20 ReplicationFactor:3 Configs:retention.ms=10800000
... View more
05-19-2018
05:10 AM
There is no such support for renaming https://issues.apache.org/jira/browse/KAFKA-2333 If you want to clone, then use MirrorMaker https://community.hortonworks.com/articles/79891/kafka-mirror-maker-best-practices.html
... View more
05-14-2018
03:31 AM
@Michael Bronson Kafka stores the latest offsets in memory before they are sent to disk, therefore, the more memory the better, with a max of 8G. And I would assume that the heap properties can be set from Ambari rather than individually on the broker, but I don't use Kafka from HDP, so I can't say.
... View more
05-11-2018
01:16 AM
1 Kudo
The recommendation here would be to increase the heap space allocated to the Kafka process or reduce the amount of other processes running on the same server. For example, in a production environment, the Kafka brokers should be standalone servers -- not on the same hardware as Zookeeper or other Hadoop processes.
... View more