About satz

satz · ‎06-14-2020

Hello @monorels , Thank you for posting the query. Check the total size of the files present on the event logging directory. possibly if you have huge sized files you would need to increase the spark history server's heap memory. Also check from the logs (spark history server logs) and see if you are getting any errors while spark history server replays the event logs from HDFS path. If there are no errors observed try to enable DEBUG level logging. Even still no errors logged, possibly you might need to check the number of event logs and the allocated heap memory usage.

satz · ‎06-02-2020

Hello @renzhongpei , From the log4j properties file I see you are trying to write the logs in local file path [ log4j.appender.FILE.File=/home/rzpt/logs/spark.log ] Please note that, with the above log4j properties, the executors & Driver (in cluster mode) basically tries to write log files on above specified path on all the nodes where containers (executor) runs If your requirement is such, you would need to follow command like this (assuming the log4j.properties file in your local /tmp path on the node where you execute spark2-submit) spark2-submit --class com.nari.sgp.amc.measStandAssess.aurSum.AurSumMain --files /tmp/log4j.properties --conf spark.driver.extraJavaOptions="-Dlog4j.configuration=log4j.properties" --conf "spark.executor.extraJavaOptions="-Dlog4j.configuration=log4j.properties" --master yarn --deploy-mode cluster sgp-1.0.jar Note that in above command's "-Dlog4j.configuration=log4j.properties" you can use as it is (i.e) you don't need to give the explicit local path such as file:// . since the executor would automatically pickup the log4j.properties from the container localised path

satz · ‎06-02-2020

Hello @mike_bronson7 , Thank you for posting your query You can execute 'get' on the same zookeeper client shell for the znode you would be able to get the hostname Example: zookeeper-shell.sh zoo_server1:2181 <<< "ls /brokers/ids/1018" It returns output as follows (example - in my case) [zk: localhost:2181(CONNECTED) 5] get /brokers/ids/10 {"listener_security_protocol_map":{"PLAINTEXT":"PLAINTEXT"},"endpoints":["PLAINTEXT://simple01.cloudera.com:9092"],"jmx_port":9393,"host":"simple01.cloudera.com","timestamp":"1590512066422","port":9092,"version":4} cZxid = 0x1619b ctime = Tue May 26 09:54:26 PDT 2020 mZxid = 0x1619b mtime = Tue May 26 09:54:26 PDT 2020 pZxid = 0x1619b cversion = 0 dataVersion = 1 aclVersion = 0 ephemeralOwner = 0x1722ddb1e844d50 dataLength = 238 numChildren = 0 so, my brokerID 10 is mapped with the host: simple01.cloudera.com

satz · ‎06-02-2020

Hello @Venkat451 , Thank you for posting your query. From the error message shared (as below) I see the executor failing while it is trying to attach itself with the consumer group, more specifically, it is getting Authorisation exception while attaching to the group. ERROR org.apache.spark.executor.Executor - Exception in task 2.0 in stage 0.0 (TID 2) org.apache.kafka.common.errors.GroupAuthorizationException: Not authorized to access group: spark-executor-<groupID> If you have authorisation mechanisms such as (sentry, kafka ACLs, Ranger) enabled on your cluster please grant necessary permissions to the consumer group https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.1/bk_security/content/kafka-acl-examples.html

satz · ‎06-02-2020

Hello @zanteb , Thank you for posting your query. While you are using with spark-submit you would require to pass the files (jass & keytab) with --files option on spark-submit just like [1] https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/developing-spark-applications/content/running_spark_streaming_jobs_on_a_kerberos-enabled_cluster.html While doing so, your JAAS and keytab file would be shipped to executors and Application master /Driver (incase of cluster mode) If your external client is not spark and it is just a standalone java code (example) then you can just go ahead with passing "-Djava.security.auth.login.config=jaas.conf"" while executing the code and file can reside on the same client node

satz · ‎06-01-2020

Hie @mig_aguir , Thank you for posting your query Could you please try running the job with --conf spark.unsafe.sorter.spill.read.ahead.enabled=false

satz · ‎05-22-2020

Hello @Mondi , Thank you for posting your query. Spark History server would replay the logs as soon as it gets the files (eventlogs) in the configured HDFS path [ /user/spark/applicationHistory]. The replay operation just reads the Event logs from HDFS path and loads in to memory to make it available for rendering. In your case, you have already confirmed that the file is present on the HDFS event logging directory. As a next step, could you please review the Spark History server logs and check if the replay operation is happening? Also, there are chances that if the file/directory permissions of event logs are incorrect the replay operation would fail silently, In such scenarios, you might need to enable DEBUG level logs to review whats wrong with replay operations. Hope this helps.

satz · ‎05-22-2020

@sahithi1 You can restart the Namenode in Ambari by selecting the Namenode process >> And from the drop down >>click on restart. And could you please share any specific issue which you are facing while restarting ?

satz · ‎03-04-2020

Hello @lakshmipathy , Please refer the below thread, hope this would help https://community.cloudera.com/t5/Support-Questions/How-to-define-topic-retention-with-kafka/td-p/222671

satz · ‎03-04-2020

Hello @ravisro , I don't think there would be a straight forward way for this, (i.e) we might need to perform some sort of data cleansing work prior feeding it to spark in my view. Possibly the inputs shared does contains new line characters (\n) which might make spark to confuse with the data and new lines. I did some sort of data cleansing, (i.e) removing newlines gave me below result inputDf.show(false) +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------+ |_c0 |_c1 |_c2 | +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------+ |2020-02-23 11:15:39|"Hi Craig, Please approve the standard pricing. No further amendments made "Legal System."Justification -XXX is the sole owner in China Thank you."|Approved| +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------+ while running the same code from spark.

Online	Offline
Last Visited	‎10-15-2025 01:35 AM

Member Since	‎05-15-2018 09:52 PM
Last Visited	‎10-15-2025 01:35 AM
Posts	146
Kudos received	15

Cloudera Community

Re: kafka configuration + zookeeper cli + get the ...

Re: CDH 6.2 Spark 2.4.0 saveAsTable error: java.i...

Re: If HDFS services goes down

Re: duplicate directories in hdfs location

Re: Send text file into HDFS using Flume in Cloude...

Re: Spark history server has some issue.

Re: Config log4j in Spark Yarn cluster

Re: kafka configuration + zookeeper cli + get the ...

Re: Am getting GroupAuthorizationException: Not au...

Re: Does Jaas.conf file needs to be in local path ...

Re: CDH 6.2 Spark 2.4.0 saveAsTable error: java.i...

Re: Spark history server showing no completed appl...

Re: How to restart name node?

Re: I need to change the retention time from 7 day...

Re: Problem in reading CSV file using Apache Spark