Created 11-25-2024 04:39 PM
In Apache Spark, spark_shuffle and spark2_shuffle are configuration options related to Spark's shuffle operations, which can be set to start auxiliary services within the Yarn NodeManager. But what is the difference between these two?
Created on 12-18-2024 12:21 PM - edited 12-18-2024 12:28 PM
@allen_chu
Maybe I didn't understand the question well but here are the differences and explanation to help you understand and configure the 2 options correctly
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark2_shuffle</value>
</property>
3. Configuration in YARN
To enable the shuffle service for both versions, configure the NodeManager to start both services:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle,spark2_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.spark2_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
4. Key Points
In modern setups, spark2_shuffle is the primary shuffle service since Spark 1.x is largely deprecated.
Happy hadooping
Created on 12-18-2024 12:21 PM - edited 12-18-2024 12:28 PM
@allen_chu
Maybe I didn't understand the question well but here are the differences and explanation to help you understand and configure the 2 options correctly
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark2_shuffle</value>
</property>
3. Configuration in YARN
To enable the shuffle service for both versions, configure the NodeManager to start both services:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle,spark2_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.spark2_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
4. Key Points
In modern setups, spark2_shuffle is the primary shuffle service since Spark 1.x is largely deprecated.
Happy hadooping
Created 12-19-2024 06:44 PM
Thank you for your reply. This information is very helpful.