About bdelpizzo

bdelpizzo · ‎11-17-2017

I'm using directStream and topics are read one by one so I was thinking that 3 tasks were enough. Strange thing is that I'm observing a different behavior running the same application on another cluster. The second cluster is smaller than the first, it has 3 brokers instead of 4. In order to reach good performance I need to run the application with 6 executors with 1 core each and I can see that only 3 executors receive the work. The described scenario could be related to the architecture of the cluster? Thanks again, Beniamino

bdelpizzo · ‎11-17-2017

I've a spark streaming application that reads from 4 different kafka topics and each topic has 3 partitions. Reading operation is done in different instants (I have 4 pipeline processed in sequence) so in my idea I need just 3 spark executor (one for each partition of each topic) with one core each. Submitting the application in this way I can see that execution is not parallelized between executor and processing time is very high respect to the complexity of the computation. What's wrong with this assumption? If I run the same application with 4 executors with 4 cores each the execution is parallelized through all the executors and processig time is low. I'm wondering if exists a best practices in terms of executor for topic/partition and cores while consuming from a kafka topic with spark streaming. Thanks in advance, Beniamino

bdelpizzo · ‎02-02-2017

Hi, thank you for the responce. Consumer stream is bounded to the number of the partition of the topic so increasing the number of consumer and producer will not solve the problem. From one partition you can have maximum one consumer for a single consumer group. I was thinking about to increase the queue size of the mirror maker but this is still not working..

bdelpizzo · ‎01-31-2017

I'm having issue with Kafka Mirror Maker. I've stopped the mirror maker for 30 minutes due to a cluster upgrade and at the restart of the cluster the mirror maker is not able to consume data from the source cluster. I see that the lag of the consumer group of the mirror maker is very high so I'm thinking about some parameters to change in order to increase the buffer size of the mirror maker. I've tried changing the consumer group for the mirror maker and in this case this operation allows to restart consuming data from the latest messages. When I try to restart the process from the last saved offsets I see a peak of consumed data but the mirror maker is not able to commit offsets infact the log is blocked at the row: INFO kafka.tools.MirrorMaker$: Committing offsets and no more rows are showev after this one. I think that the problem is related to the huge amount of data to process. Ive running a cluster with Kafka 0.8.2.1 with this configuration: auto.offset.reset=largest offsets.storage=zookeeper dual.commit.enabled=false

Online	Offline
Last Visited	‎04-10-2018 10:12 AM

Member Since	‎01-31-2017 01:29 AM
Last Visited	‎04-10-2018 10:12 AM
Posts	10

Cloudera Community

Re: What's the right number of cores and executors...

What's the right number of cores and executors for...

Re: When there are a lot message the mirror maker ...

When there are a lot message the mirror maker is n...