About TonyStank

TonyStank · ‎04-17-2020

Hey @sharathkumar13, Thanks for reaching out to the Cloudera community. >> You can refer to the mentioned Git Repo[1] for information on Kafka exporter for Prometheus in Kafka Manager. [1]https://github.com/danielqsj/kafka_exporter >> I would like to share information on SMM[2], Streams Messaging Manager is an operations monitoring and management tool from Cloudera that provides end-to-end visibility in an enterprise Apache Kafka environment. With SMM, you can gain clear insights about your Kafka clusters. You can understand the end-to-end flow of message streams from producers to topics to consumers. [2]https://docs.cloudera.com/csp/2.0.1/smm-overview/topics/smm-overview.html Let me know if this helps.

TonyStank · ‎04-14-2020

Hey @AndyTech, Thanks for reaching out to the Cloudera community. The commit-id mentioned here isn't related to any Kafka usage related terms such as 'commit offsets' or other terms. This commit id refers to the Kafka source from which it was built. It is not an error but just an info message. This doesn't impact Kafka client's functionality in any way. Let me know if this helps. Cheers,

TonyStank · ‎04-14-2020

Hey @AndyTech, Thanks for reaching out to the Cloudera community. This issue is due to the missing "kafka-python" module in your Python installation. You have to manually install the "kafka-python" module using the mentioned command in the edge node and all the hosts on which Spark job executes. $ pip install kafka-python

TonyStank · ‎04-14-2020

Hey @GTA, Thanks for reaching out to the Cloudera community. "Required executor memory (1024), overhead (384 MB), and PySpark memory (0 MB) is above the max threshold (1024 MB) of this cluster!" >> This issue occurs when the total memory required to run a spark executor in a container (Spark executor memory -> spark.executor.memory + Spark executor memory overhead: spark.yarn.executor.memoryOverhead) exceeds the memory available for running containers on the NodeManager (yarn.nodemanager.resource.memory-mb) node. Based on the above exception you have 1 GB configured by default for a spark executor, the overhead is by default 384 MB, the total memory required to run the container is 1024+384 MB = 1408 MB. As the NM was configured with not enough memory to even run a single container (only 1024 MB), this resulted in a valid exception. Increasing the NM settings from 1251 to 2048 MB will definitely allow a single container to run on the NM node. Use the mentioned steps to increase "yarn.nodemanager.resource.memory-mb" parameter to resolve this. Cloudera Manager >> YARN >> Configurations >> Search "yarn.nodemanager.resource.memory-mb" >> Configure 2048 MB or higher >> Save & Restart. Let me know if this helps.

TonyStank · ‎04-14-2020

Hey @Deep_live, Apologies, I'm unable to locate a cached/archived CDH 5.x quickstart VM image.

TonyStank · ‎04-14-2020

Hey, The Cloudera Quickstart VM has been discontinued for CDH 5.x & 6.x by Cloudera. >> You can try a docker image of Cloudera available publicly on https://hub.docker.com/r/cloudera/quickstart or simply run below command to download this on docker enabled system. $ docker pull cloudera/quickstart

TonyStank · ‎10-18-2019

Hey, Thank you for sharing the outcome and the steps. Much appreciated. Regards.

TonyStank · ‎10-17-2019

Hey, This exception might encounter if jline version isn't in sync with the scala version. What is your current Scala version? Regards, Ankit.

TonyStank · ‎10-17-2019

Hey, Optimizing your Kafka Cluster depends upon your cluster usage & use-case. Based on your main concern like throughput or CPU utilization or Memory/Disk usage, you need to modify different parameters and some changes may have an impact on other aspects. For example, if acknowledgments is set to "all", all brokers that replicate the partitions need to acknowledge that the data was written prior to confirming the next message needs to be sent. This will ensure data consistency but increase CPU utilization and network latency. Refer Benchmarking Apache Kafka: 2 Million Writes Per Second (On Three Cheap Machines) article[1] written by Jay Kreps(Co-founder and CEO at Confluent). [1]https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines Please let me know if this helps. Regards, Ankit.

TonyStank · ‎10-17-2019

Hey, Can you please try setting the SPARK_HOME env variable to the location indicated by the readlink command it launches pyspark and shows Spark 2.0 as the version? For Example: export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2 By setting SPARK_HOME to the Spark 2 lib folder instead, pyspark2 will then launch and show Spark 2.3.0.cloudera3 as the spark version. Please let me know if this helps. Regards, Ankit.

Online	Offline
Last Visited	‎12-23-2020 12:28 AM

Member Since	‎09-15-2018 02:48 AM
Last Visited	‎12-23-2020 12:28 AM
Posts	61
Kudos received	6

Cloudera Community

Re: How to configure Prometheus JMX exporter in Ka...

Re: Error while initiating spark shell

Re: Unable to find CDH quick start VM image downlo...

Re: kafka optimization

Re: pyspark using Spark 2.3

Re: How to configure Prometheus JMX exporter in Ka...

Re: KAFKA - utils.AppInfoParser: Kafka commitId: u...

Re: Cloudera kafka pyspark KafkaProducer (ImportEr...

Re: Error while initiating spark shell

Re: Unable to find CDH quick start VM image downlo...

Re: Unable to find CDH quick start VM image downlo...

Re: Cloudera 6.3 - Spark Shell Fails with java.lan...

Re: Cloudera 6.3 - Spark Shell Fails with java.lan...

Re: kafka optimization

Re: pyspark using Spark 2.3