07-06-2018 12:24 AM - edited 07-06-2018 10:15 AM
I edited this post because when I posted it I was very emotional coming off of a failed exam.
Basically to make things short. The problem was that I know testimony of 2 spark developers and my own experience in a previously failed exam, that we were able to import the spark-csv.jar via the --packages import during our exam certifications, so I came prepared to handle CSV files using that library as I was sure it was going to be available. This time, I couldn't import it, I ended up answering 6 out of 9 problems because the 3 missing problems all relied on CSV files, and I simply didn't train on how to do this with bare spark. The techniques I've found online afterwards, are all pretty cumbersome and most if not all use the map function from the RDD API, I only really trained using the Dataframe API.
I setup a claim to email@example.com and they answered that they have never given the ability to anyone to import any third party libraries. This is not true. Not only was I able to import it during my first certification exam... 2 other developers that I know who are certified were able to import it as well, and following the logic, I'm pretty sure many others have been able to do so.
In the end, this topic is here in order to raise awareness about this issue. The official statement from Cloudera was that this is not possible, so be prepared to handle CSV files without spark-csv.jar, or face desperation trying to experiment techniques and browse API docs in the midst of a certification exam.