Support Questions

Find answers, ask questions, and share your expertise

How to set multiple spark configurations for single spark job

New Contributor

I am dealing with a weird situation , where I have small tables and big tables to process using spark and it must be a single spark job.

To achieve best performance targets, I need to set a property called

spark.sql.shuffle.partitions = 12 for small tables and
spark.sql.shuffle.partitions = 500 for bigger tables

I want to know how can I change these properties dynamically in spark ?
Can I have multiple configuration files and call it within the program ?

1 REPLY 1

Contributor

I believe you can achieve this by following the below sequence: 

 

  1. spark.sql("SET spark.sql.shuffle.partitions=12")
  2. Execute operations on small table
  3. spark.sql("SET spark.sql.shuffle.partitions=500")
  4. Execute operations on larger table
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.