About TonyStank

BigDataBear · ‎07-31-2020

I used to work at Cloudera/Hortonworks, and now I am a Hashmap Inc. consultant. This solution worked perfectly, thank you.

sharathkumar13 · ‎04-18-2020

Thank you @TonyStank . This helps me.

TonyStank · ‎04-14-2020

Hey @AndyTech, Thanks for reaching out to the Cloudera community. The commit-id mentioned here isn't related to any Kafka usage related terms such as 'commit offsets' or other terms. This commit id refers to the Kafka source from which it was built. It is not an error but just an info message. This doesn't impact Kafka client's functionality in any way. Let me know if this helps. Cheers,

TonyStank · ‎04-14-2020

Hey @AndyTech, Thanks for reaching out to the Cloudera community. This issue is due to the missing "kafka-python" module in your Python installation. You have to manually install the "kafka-python" module using the mentioned command in the edge node and all the hosts on which Spark job executes. $ pip install kafka-python

Deep_live · ‎04-14-2020

@TonyStank appreciate your help. Stay safe.

TonyStank · ‎10-18-2019

Hey, Thank you for sharing the outcome and the steps. Much appreciated. Regards.

jeroenr · ‎10-17-2019

thanks, that put me in the right direction for completeness, just setting SPARK_HOME was not sufficient, it was missing py4j setting PYTHONPATH fixed that issue export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2 export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH Now pyspark shows: version 2.3.0.cloudera3

TonyStank · ‎10-17-2019

Hey, Optimizing your Kafka Cluster depends upon your cluster usage & use-case. Based on your main concern like throughput or CPU utilization or Memory/Disk usage, you need to modify different parameters and some changes may have an impact on other aspects. For example, if acknowledgments is set to "all", all brokers that replicate the partitions need to acknowledge that the data was written prior to confirming the next message needs to be sent. This will ensure data consistency but increase CPU utilization and network latency. Refer Benchmarking Apache Kafka: 2 Million Writes Per Second (On Three Cheap Machines) article[1] written by Jay Kreps(Co-founder and CEO at Confluent). [1]https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines Please let me know if this helps. Regards, Ankit.

nhemamalini · ‎09-12-2019

I didnt use FQDN, instead i just added ip in /etc/hosts file. i used the same host ip in the kafka config

wert_1311 · ‎02-13-2019

Hi Tony, Thanks for your reply, appericate all the hep provided by you and Gzigldrum Regards Wert

Online	Offline
Last Visited	‎12-23-2020 12:28 AM

Member Since	‎09-15-2018 02:48 AM
Last Visited	‎12-23-2020 12:28 AM
Posts	61
Kudos received	6

Cloudera Community

Re: How to configure Prometheus JMX exporter in Ka...

Re: Error while initiating spark shell

Re: Unable to find CDH quick start VM image downlo...

Re: kafka optimization

Re: pyspark using Spark 2.3

Re: Error while initiating spark shell

Re: How to configure Prometheus JMX exporter in Ka...

Re: KAFKA - utils.AppInfoParser: Kafka commitId: u...

Re: Cloudera kafka pyspark KafkaProducer (ImportEr...

Re: Unable to find CDH quick start VM image downlo...

Re: Cloudera 6.3 - Spark Shell Fails with java.lan...

Re: pyspark using Spark 2.3

Re: kafka optimization

Re: Not able to read from kafka topic

Re: Retention policy for Cloudera Management Servi...