spark-submit --packages com:databricks:spark-csv_2.10:1.2.0 task.py.
I got an error that the class could not be found. My question is how can I write a CSV file using the dataframes like below
Any suggestions on how can I solve the problem of writing a CSV or a TAB file in the certification Exam, I am pretty sure, I failed it, since I could not write.
Any help is greatly appreciated, I am planning to take the exam again soon, but my question is what are my choices, if I get the same error while writing a file, though my inputs are correct...
Hi, I did the spark certification exam and had the same issue.
We can not download packages while launching spark as the environment is locked from any outside downloads.
--packages pull the spark-csv package from the maven repository.
In python, you are able to use the python csv package that comes as a standard library with the python install.
You do nat have access to Pandas, SKLearn and any package mangers such as pip and easy-install. They are locked out from the certification environment.
Assuming that there are three columns in DF as (col1:Int, col2:String, col3:Float), the CSV file output can be achieved as:
df.map(x => (x.getInt(0) + "," + x.getString(1) + "," + x.getFloat(2))).saveAsTextFile("out.csv")
--- OR ---
df.map(x => (x(0) + "," + x(1) + "," + x(2))).saveAsTextFile("out.csv")
Same problem today how did you solve it? How to write a csv file from hive table without databriks obviously not present in the verisone 1.6 of spark in the exam. I feel fragile I could not complete the exam.