Hi,
I have a question regarding the exam. If the question
does not specify the output format (json, parquet, etc..), does it mean I
can use any of the available
options in spark? For example, would the output (which I will export via my Spark code) in hdfs
"part0000-.....gz.parquet" be valid (assuming the data inside complies
with the question conditions/criteria).
Also, may I used
DataFrames & Spark SQL to process the datasets, instead of plain RDD
if the question does not specify that as well?
Thanks