Member since
11-10-2023
3
Posts
3
Kudos Received
0
Solutions
02-26-2024
09:42 PM
2 Kudos
Spark version 3.2.1
... View more
11-15-2023
07:50 AM
1 Kudo
Hello @one4like , Pushing every local file of a job to HDFS will cause issues, especially in larger clusters. Local directories are used as scratch location. Spills of mappers are written there and moving that over to the network will have performance impacts. The local storage of the scratch files and shuffle files is done exactly to prevent this. It also has security impacts as the NM now pushes the keys for each application on to a network location which could be accessible for others. A far better solution is to use the fact that the value of yarn.nodemanager.local-dirs can point to multiple mount points and thus spreading the load over all mount points. So the answer is NO. local-dirs must contain a list of local paths. There's an explicit check in code which only allows local FS to be used. See here: https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java#L224 Please note that an exception is thrown when a non local file system is referenced. If you found this response assisted with your query, please take a moment to log in and click on KUDOS 🙂 & ”Accept as Solution" below this post. Thank you. Bjagtap
... View more