Could someone tell me the answer of below question, why and how?
Q. How many partitions shall "intialiy" be created with the following command on spark shell-
There are 100 files in directory /user/cloudera/csvfiles and there are 10 nodes running Spark.
textFile() partitions based on the number of HDFS blocks the file uses. If the file is only 1 block, then RDD is initialized with minimum of 2 partitions. If you want to increase the minimum no of partitions then you can pass an argument for it like below