About GaryWO

GaryWO · ‎12-05-2022

A good starting point is to review mistake number 1, in the slideshare: https://www.slideshare.net/SparkSummit/top-5-mistakes-when-writing-spark-applications-63071421 This gives a good starting point in tuning the cores, executor memory, etc.

GaryWO · ‎08-30-2021

You are requesting how to get the "Per job" memory and cpu counters. Please see the recent response in: https://community.cloudera.com/t5/Support-Questions/How-to-get-the-YARN-jobs-metadata-directly-not-using-API/m-p/322711/highlight/false#M228910 In the metadata (counter) output, you will see the vcore-milliseconds and vcore-millseconds value for all map and reduce tasks, Task Summary, Analysis, File System Counters for the job and other info about the specific job.

GaryWO · ‎08-30-2021

You have the following options to see the jobs counters (metadata). mapred job -history /usr/history/done/<date>/<job>.jhist -format human|json For the json format, you can pipe its output to python -m json.tool for a cleaner output. Note that JHS seeds the jobs from .jhist files (For every job, there is one .jhist file) that are stored in the HDFS directory, by default /user/history/done. The .jhist files are generated by individual job before the job completes. You may access this metadata from the .jhist files with the above commands. If the AM failed to move its .jhist file to the directory that JHS looks for, JHS has no idea of the job at all. mapred job -history /usr/history/done/<date>/<job>.jhist -format human mapred job -history /usr/history/done/<date>/<job>.jhist -format json mapred job -history /usr/history/done/<date>/<job>.jhist -format json | python -m json.tool No password is needed, but you might have to kinit if in a kerberized environment.

GaryWO · ‎12-08-2017

This is the order of precedence for configurations that Spark will use: - Properties set on SparkConf or SparkContext in code - Arguments passed to spark-submit, spark-shell, or pyspark at run time - Properties set in /etc/spark/conf/spark-defaults.conf, a specified properties file or in Cloudera Manager safety valve - Environment variables exported or set in scripts * For properties that apply to all jobs, use spark-defaults.conf, for properties that are constant and specific to a single or a few applications use SparkConf or --properties-file, for properties that change between runs use command line arguments.

Online	Offline
Last Visited	‎01-31-2023 01:57 PM

Member Since	‎09-07-2017 07:17 AM
Last Visited	‎01-31-2023 01:57 PM
Posts	40
Kudos received	1

Cloudera Community

Re: Set environment variable in CDH for Spark exec...

Re: yarn and spark memory tuning for the jobs

Re: CPU and Memory Usage per job perspective.

Re: How to get the YARN jobs metadata directly ( n...

Re: Set environment variable in CDH for Spark exec...