About satz

Uday483 · ‎04-07-2023

@satz , I am not referring to writing data using Kafka connect. Kafka partition data should be written to cloud after spending certain amount of time on Disk. https://docs.confluent.io/platform/current/kafka/tiered-storage.html Thanks, Uday

cc_br2022 · ‎06-14-2022

Hey, have you find out where to add this code? '--conf spark.unsafe.sorter.spill.read.ahead.enabled=false' Thanks!!

Kayyen · ‎04-01-2021

@satz I have similar issue, Is it possible to share data cleansing - removing newlines coding snippet.

satz · ‎06-02-2020

Hello @renzhongpei , From the log4j properties file I see you are trying to write the logs in local file path [ log4j.appender.FILE.File=/home/rzpt/logs/spark.log ] Please note that, with the above log4j properties, the executors & Driver (in cluster mode) basically tries to write log files on above specified path on all the nodes where containers (executor) runs If your requirement is such, you would need to follow command like this (assuming the log4j.properties file in your local /tmp path on the node where you execute spark2-submit) spark2-submit --class com.nari.sgp.amc.measStandAssess.aurSum.AurSumMain --files /tmp/log4j.properties --conf spark.driver.extraJavaOptions="-Dlog4j.configuration=log4j.properties" --conf "spark.executor.extraJavaOptions="-Dlog4j.configuration=log4j.properties" --master yarn --deploy-mode cluster sgp-1.0.jar Note that in above command's "-Dlog4j.configuration=log4j.properties" you can use as it is (i.e) you don't need to give the explicit local path such as file:// . since the executor would automatically pickup the log4j.properties from the container localised path

satz · ‎06-02-2020

Hello @mike_bronson7 , Thank you for posting your query You can execute 'get' on the same zookeeper client shell for the znode you would be able to get the hostname Example: zookeeper-shell.sh zoo_server1:2181 <<< "ls /brokers/ids/1018" It returns output as follows (example - in my case) [zk: localhost:2181(CONNECTED) 5] get /brokers/ids/10 {"listener_security_protocol_map":{"PLAINTEXT":"PLAINTEXT"},"endpoints":["PLAINTEXT://simple01.cloudera.com:9092"],"jmx_port":9393,"host":"simple01.cloudera.com","timestamp":"1590512066422","port":9092,"version":4} cZxid = 0x1619b ctime = Tue May 26 09:54:26 PDT 2020 mZxid = 0x1619b mtime = Tue May 26 09:54:26 PDT 2020 pZxid = 0x1619b cversion = 0 dataVersion = 1 aclVersion = 0 ephemeralOwner = 0x1722ddb1e844d50 dataLength = 238 numChildren = 0 so, my brokerID 10 is mapped with the host: simple01.cloudera.com

satz · ‎06-02-2020

Hello @Venkat451 , Thank you for posting your query. From the error message shared (as below) I see the executor failing while it is trying to attach itself with the consumer group, more specifically, it is getting Authorisation exception while attaching to the group. ERROR org.apache.spark.executor.Executor - Exception in task 2.0 in stage 0.0 (TID 2) org.apache.kafka.common.errors.GroupAuthorizationException: Not authorized to access group: spark-executor-<groupID> If you have authorisation mechanisms such as (sentry, kafka ACLs, Ranger) enabled on your cluster please grant necessary permissions to the consumer group https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.1/bk_security/content/kafka-acl-examples.html

satz · ‎06-02-2020

Hello @zanteb , Thank you for posting your query. While you are using with spark-submit you would require to pass the files (jass & keytab) with --files option on spark-submit just like [1] https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/developing-spark-applications/content/running_spark_streaming_jobs_on_a_kerberos-enabled_cluster.html While doing so, your JAAS and keytab file would be shipped to executors and Application master /Driver (incase of cluster mode) If your external client is not spark and it is just a standalone java code (example) then you can just go ahead with passing "-Djava.security.auth.login.config=jaas.conf"" while executing the code and file can reside on the same client node

Mondi · ‎05-25-2020

Hi @satz I've checked the spark history logs and it says that it has a read permission denied for the user named "spark". I've change recursively the /user/spark permission and ownership to spark but when there is a new file, it has its own permission type so it can't be read again by spark.

dyadav1 · ‎05-22-2020

You can also try to stop and start [HDP] namenode services from command line by using below command: If you are running NameNode HA (High Availability), start the JournalNodes by executing these commands on the JournalNode host machines: su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-journalnode/../hadoop/sbin/hadoop-daemon.sh start journalnode" where $HDFS_USER is the HDFS user. For example, hdfs. Execute this command on the NameNode host machine(s): su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh start namenode" If you are running NameNode HA, start the ZooKeeper Failover Controller (ZKFC) by executing the following command on all NameNode machines. The starting sequence of the ZKFCs determines which NameNode will become Active. su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh start zkfc" If you are not running NameNode HA, execute the following command on the Secondary NameNode host machine. If you are running NameNode HA, the Standby NameNode takes on the role of the Secondary NameNode. su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh start secondarynamenode"

satz · ‎03-04-2020

Hello @lakshmipathy , Please refer the below thread, hope this would help https://community.cloudera.com/t5/Support-Questions/How-to-define-topic-retention-with-kafka/td-p/222671

Online	Online
Last Visited	‎12-18-2024 12:28 AM

Member Since	‎05-15-2018 09:52 PM
Last Visited	‎12-18-2024 12:28 AM
Posts	132
Kudos received	15

Cloudera Community

Re: kafka configuration + zookeeper cli + get the ...

Re: CDH 6.2 Spark 2.4.0 saveAsTable error: java.i...

Re: If HDFS services goes down

Re: duplicate directories in hdfs location

Re: Send text file into HDFS using Flume in Cloude...

Re: Kafka Tiered Storage

Re: CDH 6.2 Spark 2.4.0 saveAsTable error: java.i...

Re: Problem in reading CSV file using Apache Spark

Re: Config log4j in Spark Yarn cluster

Re: kafka configuration + zookeeper cli + get the ...

Re: Am getting GroupAuthorizationException: Not au...

Re: Does Jaas.conf file needs to be in local path ...

Re: Spark history server showing no completed appl...

Re: How to restart name node?

Re: I need to change the retention time from 7 day...