Member since
09-23-2015
81
Posts
108
Kudos Received
41
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5702 | 08-17-2017 05:00 PM | |
2263 | 06-03-2017 09:07 PM | |
2783 | 03-29-2017 06:02 PM | |
5278 | 03-07-2017 06:16 PM | |
1959 | 02-26-2017 06:30 PM |
12-16-2015
03:51 PM
3 Kudos
We are releasing 0.9 apache kafka in HDP 2.3.4. We are very close to release.
... View more
12-15-2015
07:13 PM
4 Kudos
spark streaming haven't yet enabled security for their kafka connector
... View more
12-14-2015
05:52 PM
1 Kudo
By default ambari installed kafka binds to host.name of that machine. So you can access the kafka broker based on hostname from any other machines as long as hostname is DNS resolved or add it to your /etc/hosts
... View more
12-07-2015
06:07 PM
1 Kudo
@jniemiec which version of storm are you using. From 0.10 onwards we relocated the dependencies in storm so that user topologies won't run into conflicts like above.
... View more
12-02-2015
08:30 PM
1 Kudo
for JMX params on kafka brokers side you can use kafka-env section in Ambari and restart the brokers. For kafka-produer-perf-test.sh scripts you can pass it via sh
... View more
11-06-2015
12:32 AM
No. HDP-2.2 storm's kafka spout doesn't support connecting kerberos kafka.
... View more
10-26-2015
03:59 PM
Storm multi tenancy can be enabled when using security only. In non-secure mode all worker jvms will be running as user "storm" as a single user. Where as in storm secure mode (using kerberos) you can set supervisor.worker.run.as.user to true. This makes worker jvm to run as user who submitted the topology. This will give worker isolation between topologies. More details https://github.com/apache/storm/blob/master/SECURITY.md#run-worker-processes-as-user-who-submitted-the-topology
... View more
10-17-2015
05:12 PM
2 Kudos
kafka producer doesn't need to know about zookeeper cluster. It takes in broker-list as a config option which than used to send topic metadata requests to determine who is the leader of the topic partition
... View more
10-17-2015
05:10 PM
This is old config file http://kafka.apache.org/07/configuration.html version 0.7 for the new configuration you should look at https://kafka.apache.org/08/configuration.html By using kafka-topics.sh you create how many partitions you would like to have in a topic. Usually this should be determined by how much parallelism you would like to have on the consumer side to read from the topic. Kafka doesn't determine how you distribute the data into the topic partitions. That depends on the producer. As you said if key is null it does a round-robin not random. If the key is provided it does a hash based distribution. All of this happens on the producer side not in kafka broker.
... View more
10-16-2015
11:27 PM
1 Kudo
Make sure you set the following config in kafkaspout's Spoutconfig spoutConfig.startOffsetTime = kafka.api.OffsetRequest.EarliestTime(); https://github.com/apache/storm/tree/master/external/storm-kafka Apart from that 1. Make sure you log.retention.hours is long enough to retain topic data 2. Check kafka topic offsets bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list hostname:6667 --topic topic_name --time -1 the above command will give you latest offset into kafka topic and now you need to check if storm kafkaspout is catching up. 2.1 login into zookeeper shell 2.2 ls /zkroot/id (zkroot is the one configured in spoutconfig and id from spoutconfig) as well 2.3 get /zkroot/id/topic_name/part_0 will give you a json structure with key "offset" this will tell you how far you read into topic and also how far you are behind reading the latest data. If its too far apart and if log.retention.hours hit you kafkaspout might be requesting for older offset which might have been deleted.
... View more