Member since
05-15-2018
132
Posts
15
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1660 | 06-02-2020 06:22 PM | |
20218 | 06-01-2020 09:06 PM | |
2621 | 01-15-2019 08:17 PM | |
4916 | 12-21-2018 05:32 AM | |
5357 | 12-16-2018 09:39 PM |
06-14-2020
07:16 PM
Hello @monorels , Thank you for posting the query. Check the total size of the files present on the event logging directory. possibly if you have huge sized files you would need to increase the spark history server's heap memory. Also check from the logs (spark history server logs) and see if you are getting any errors while spark history server replays the event logs from HDFS path. If there are no errors observed try to enable DEBUG level logging. Even still no errors logged, possibly you might need to check the number of event logs and the allocated heap memory usage.
... View more
06-02-2020
06:49 PM
Hello @renzhongpei , From the log4j properties file I see you are trying to write the logs in local file path [ log4j.appender.FILE.File=/home/rzpt/logs/spark.log ] Please note that, with the above log4j properties, the executors & Driver (in cluster mode) basically tries to write log files on above specified path on all the nodes where containers (executor) runs If your requirement is such, you would need to follow command like this (assuming the log4j.properties file in your local /tmp path on the node where you execute spark2-submit) spark2-submit --class com.nari.sgp.amc.measStandAssess.aurSum.AurSumMain --files /tmp/log4j.properties --conf spark.driver.extraJavaOptions="-Dlog4j.configuration=log4j.properties" --conf "spark.executor.extraJavaOptions="-Dlog4j.configuration=log4j.properties" --master yarn --deploy-mode cluster sgp-1.0.jar Note that in above command's "-Dlog4j.configuration=log4j.properties" you can use as it is (i.e) you don't need to give the explicit local path such as file:// . since the executor would automatically pickup the log4j.properties from the container localised path
... View more
06-02-2020
06:22 PM
Hello @mike_bronson7 , Thank you for posting your query You can execute 'get' on the same zookeeper client shell for the znode you would be able to get the hostname Example: zookeeper-shell.sh zoo_server1:2181 <<< "ls /brokers/ids/1018" It returns output as follows (example - in my case) [zk: localhost:2181(CONNECTED) 5] get /brokers/ids/10 {"listener_security_protocol_map":{"PLAINTEXT":"PLAINTEXT"},"endpoints":["PLAINTEXT://simple01.cloudera.com:9092"],"jmx_port":9393,"host":"simple01.cloudera.com","timestamp":"1590512066422","port":9092,"version":4} cZxid = 0x1619b ctime = Tue May 26 09:54:26 PDT 2020 mZxid = 0x1619b mtime = Tue May 26 09:54:26 PDT 2020 pZxid = 0x1619b cversion = 0 dataVersion = 1 aclVersion = 0 ephemeralOwner = 0x1722ddb1e844d50 dataLength = 238 numChildren = 0 so, my brokerID 10 is mapped with the host: simple01.cloudera.com
... View more
06-02-2020
05:28 PM
Hello @Venkat451 , Thank you for posting your query. From the error message shared (as below) I see the executor failing while it is trying to attach itself with the consumer group, more specifically, it is getting Authorisation exception while attaching to the group. ERROR org.apache.spark.executor.Executor - Exception in task 2.0 in stage 0.0 (TID 2) org.apache.kafka.common.errors.GroupAuthorizationException: Not authorized to access group: spark-executor-<groupID> If you have authorisation mechanisms such as (sentry, kafka ACLs, Ranger) enabled on your cluster please grant necessary permissions to the consumer group https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.1/bk_security/content/kafka-acl-examples.html
... View more
06-02-2020
04:26 AM
Hello @zanteb , Thank you for posting your query. While you are using with spark-submit you would require to pass the files (jass & keytab) with --files option on spark-submit just like [1] https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/developing-spark-applications/content/running_spark_streaming_jobs_on_a_kerberos-enabled_cluster.html While doing so, your JAAS and keytab file would be shipped to executors and Application master /Driver (incase of cluster mode) If your external client is not spark and it is just a standalone java code (example) then you can just go ahead with passing "-Djava.security.auth.login.config=jaas.conf"" while executing the code and file can reside on the same client node
... View more
06-01-2020
09:06 PM
Hie @mig_aguir , Thank you for posting your query Could you please try running the job with --conf spark.unsafe.sorter.spill.read.ahead.enabled=false
... View more
05-22-2020
05:43 PM
Hello @Mondi , Thank you for posting your query. Spark History server would replay the logs as soon as it gets the files (eventlogs) in the configured HDFS path [ /user/spark/applicationHistory]. The replay operation just reads the Event logs from HDFS path and loads in to memory to make it available for rendering. In your case, you have already confirmed that the file is present on the HDFS event logging directory. As a next step, could you please review the Spark History server logs and check if the replay operation is happening? Also, there are chances that if the file/directory permissions of event logs are incorrect the replay operation would fail silently, In such scenarios, you might need to enable DEBUG level logs to review whats wrong with replay operations. Hope this helps.
... View more
05-22-2020
05:22 PM
@sahithi1 You can restart the Namenode in Ambari by selecting the Namenode process >> And from the drop down >>click on restart. And could you please share any specific issue which you are facing while restarting ?
... View more
03-04-2020
11:27 PM
Hello @lakshmipathy , Please refer the below thread, hope this would help https://community.cloudera.com/t5/Support-Questions/How-to-define-topic-retention-with-kafka/td-p/222671
... View more
03-04-2020
09:30 PM
Hello @ravisro , I don't think there would be a straight forward way for this, (i.e) we might need to perform some sort of data cleansing work prior feeding it to spark in my view. Possibly the inputs shared does contains new line characters (\n) which might make spark to confuse with the data and new lines. I did some sort of data cleansing, (i.e) removing newlines gave me below result inputDf.show(false) +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------+
|_c0 |_c1 |_c2 |
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------+
|2020-02-23 11:15:39|"Hi Craig, Please approve the standard pricing. No further amendments made "Legal System."Justification -XXX is the sole owner in China Thank you."|Approved|
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------+ while running the same code from spark.
... View more