As you know the default behavior of Spark is to create multi-part files when using the .saveAsTextFile API. However I am not clear in the HDPCD-Spark Certification Exam, whether the question expects a single file or a multi-part files as output ? Can somebody shed some light on this? I typically output file by concatenating the fields seperated by a comma and calling .saveAsTextFile i.e. .map(x => x.1+","+x.2+","+...).saveAsTextFile(file_name.csv). Is there something wrong with this approach ?
... View more