Hello,
When writing a file to HDFS, from a Spark application in Scala, I cannot find a way to limit the HDFS resources to be used.
I know I can use an Hadoop confifuration for my Hadoop FileSystem object, that will be used for data manipulation such as deleting a file. Is there a way to say it that, even if I have 3 datanodes and even if each writen file should be distributed to at least 2 partitions, I would like to enforce it to be qplitted and distributed on 3 partitions and datanodes?
I would like to be able to do this programmatically, and not to configure tha Hadoop cluster and restart it... What would impact all Spark applications.
Thanks in advance for your feedback 🙂