Reply
New Contributor
Posts: 3
Registered: ‎01-17-2018

How to set multiple spark configurations for single spark job

I am dealing with a weird situation , where I have small tables and big tables to process using spark and it must be a single spark job.

To achieve best performance targets, I need to set a property called

spark.sql.shuffle.partitions = 12 for small tables and
spark.sql.shuffle.partitions = 500 for bigger tables

I want to know how can I change these properties dynamically in spark ?
Can I have multiple configuration files and call it within the program ?

Cloudera Employee
Posts: 33
Registered: ‎04-05-2016

Re: How to set multiple spark configurations for single spark job

I believe you can achieve this by following the below sequence: 

 

  1. spark.sql("SET spark.sql.shuffle.partitions=12")
  2. Execute operations on small table
  3. spark.sql("SET spark.sql.shuffle.partitions=500")
  4. Execute operations on larger table
Announcements