Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to set multiple spark configurations for single spark job

How to set multiple spark configurations for single spark job

New Contributor

I am dealing with a weird situation , where I have small tables and big tables to process using spark and it must be a single spark job.

To achieve best performance targets, I need to set a property called

spark.sql.shuffle.partitions = 12 for small tables and
spark.sql.shuffle.partitions = 500 for bigger tables

I want to know how can I change these properties dynamically in spark ?
Can I have multiple configuration files and call it within the program ?

1 REPLY 1
Highlighted

Re: How to set multiple spark configurations for single spark job

Contributor

I believe you can achieve this by following the below sequence: 

 

  1. spark.sql("SET spark.sql.shuffle.partitions=12")
  2. Execute operations on small table
  3. spark.sql("SET spark.sql.shuffle.partitions=500")
  4. Execute operations on larger table
Don't have an account?
Coming from Hortonworks? Activate your account here