Member since
09-07-2017
40
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9148 | 12-08-2017 06:41 AM |
12-05-2022
01:16 PM
A good starting point is to review mistake number 1, in the slideshare: https://www.slideshare.net/SparkSummit/top-5-mistakes-when-writing-spark-applications-63071421 This gives a good starting point in tuning the cores, executor memory, etc.
... View more
08-30-2021
02:55 PM
You are requesting how to get the "Per job" memory and cpu counters. Please see the recent response in: https://community.cloudera.com/t5/Support-Questions/How-to-get-the-YARN-jobs-metadata-directly-not-using-API/m-p/322711/highlight/false#M228910 In the metadata (counter) output, you will see the vcore-milliseconds and vcore-millseconds value for all map and reduce tasks, Task Summary, Analysis, File System Counters for the job and other info about the specific job.
... View more
08-30-2021
01:52 PM
You have the following options to see the jobs counters (metadata). mapred job -history /usr/history/done/<date>/<job>.jhist -format human|json For the json format, you can pipe its output to python -m json.tool for a cleaner output. Note that JHS seeds the jobs from .jhist files (For every job, there is one .jhist file) that are stored in the HDFS directory, by default /user/history/done. The .jhist files are generated by individual job before the job completes. You may access this metadata from the .jhist files with the above commands. If the AM failed to move its .jhist file to the directory that JHS looks for, JHS has no idea of the job at all. mapred job -history /usr/history/done/<date>/<job>.jhist -format human mapred job -history /usr/history/done/<date>/<job>.jhist -format json mapred job -history /usr/history/done/<date>/<job>.jhist -format json | python -m json.tool No password is needed, but you might have to kinit if in a kerberized environment.
... View more
12-08-2017
06:41 AM
1 Kudo
This is the order of precedence for configurations that Spark will use:
- Properties set on SparkConf or SparkContext in code
- Arguments passed to spark-submit, spark-shell, or pyspark at run time
- Properties set in /etc/spark/conf/spark-defaults.conf, a specified properties file or in Cloudera Manager safety valve
- Environment variables exported or set in scripts
* For properties that apply to all jobs, use spark-defaults.conf, for properties that are constant and specific to a single or a few applications use SparkConf or --properties-file, for properties that change between runs use command line arguments.
... View more