Support Questions
Find answers, ask questions, and share your expertise

I attempted Spark Certification Exam today and found, I could not write a CSV file though I executed my .py task as below.

Explorer

spark-submit --packages com:databricks:spark-csv_2.10:1.2.0 task.py.

I got an error that the class could not be found. My question is how can I write a CSV file using the dataframes like below

df.repartition(1).write.format('com.databricks.spark.csv').save('filepath')

Any suggestions on how can I solve the problem of writing a CSV or a TAB file in the certification Exam, I am pretty sure, I failed it, since I could not write.

17 REPLIES 17

Re: I attempted Spark Certification Exam today and found, I could not write a CSV file though I executed my .py task as below.

Cloudera Employee

Replace the colon with dot between com and databricks.

com.databricks:spark-csv_2.10:1.2.0

It works well with spark 1.6.3:

/opt/spark/bin/spark-submit --packages com.databricks:spark-csv_2.10:1.2.0 test.py

Where test.py is the following:

from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext(appName="Python")
sqlContext = SQLContext(sc)
q = sqlContext.read.format("com.databricks.spark.csv").load("file:///tmp/ls.txt")
q.write.format("com.databricks.spark.csv").save("file:///tmp/ls2.txt")
16/12/21 18:51:16 INFO HadoopRDD: Input split: file:/tmp/ls.txt:0+640
16/12/21 18:51:16 INFO HadoopRDD: Input split: file:/tmp/ls.txt:640+641
16/12/21 18:51:16 INFO FileOutputCommitter: Saved output of task 'attempt_201612211851_0001_m_000000_1' to file:/tmp/ls2.txt/_temporary/0/task_201612211851_0001_m_000000
16/12/21 18:51:16 INFO SparkHadoopMapRedUtil: attempt_201612211851_0001_m_000000_1: Committed
16/12/21 18:51:16 INFO FileOutputCommitter: Saved output of task 'attempt_201612211851_0001_m_000001_2' to file:/tmp/ls2.txt/_temporary/0/task_201612211851_0001_m_000001
16/12/21 18:51:16 INFO SparkHadoopMapRedUtil: attempt_201612211851_0001_m_000001_2: Committed

It also depends from your configuration (are you running it local or on yarn?). Please post the exact exception and your spark-default.conf and spark-env.sh

If you have connectivity issue you can also try to download the required jar files manually and use the --jars option of spark-submit:

/opt/spark/bin/spark-submit --jars /tmp/spark-csv_2.11-1.2.0.jar,/tmp/commons.csv-1.1.jar test.py

Where the two jars file are downloaded from the maven central repository:

http://search.maven.org/remotecontent?filepath=com/databricks/spark-csv_2.11/1.2.0/spark-csv_2.11-1....

http://search.maven.org/remotecontent?filepath=org/apache/commons/commons-csv/1.1/commons-csv-1.1.ja...

Re: I attempted Spark Certification Exam today and found, I could not write a CSV file though I executed my .py task as below.

Explorer

Unfortunately, I can't post the conf file context and the .sh file content as the error occurred during the certification exam.

Re: I attempted Spark Certification Exam today and found, I could not write a CSV file though I executed my .py task as below.

Explorer

Thank you. Much appreciated. But the issue here is, how can I handle the problem in the certification exam, since the usual way is not working. I tried to communicate with the certification officials on the issue, but in vain. I paid exam fee to take the next week again, but if the writing of the file does not work again, I don't know how to handle the same then.

Re: I attempted Spark Certification Exam today and found, I could not write a CSV file though I executed my .py task as below.

Explorer

I am able to get the same work in 3 environments I am working on....

Re: I attempted Spark Certification Exam today and found, I could not write a CSV file though I executed my .py task as below.

Explorer

Sorry, had a typo here, but I did try with a "." than a ":" in the exam but ended up getting class not found error. Pretty confused since then, any help is greatly appreciated.

Re: I attempted Spark Certification Exam today and found, I could not write a CSV file though I executed my .py task as below.

Cloudera Employee

What is your Spark version?

Re: I attempted Spark Certification Exam today and found, I could not write a CSV file though I executed my .py task as below.

Explorer

It was 1.6.3, as I remember, when I took the exam...

Re: I attempted Spark Certification Exam today and found, I could not write a CSV file though I executed my .py task as below.

Explorer

I have HortonWorks Sandbox, with Spark version 1.6.0, and the same works flawlessly. I am clueless, why could not the same work in the certification exam...Thanks for taking time to answer.

Re: I attempted Spark Certification Exam today and found, I could not write a CSV file though I executed my .py task as below.

Explorer

I think the error I got was almost as the one below.

-------------

:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS

Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: com.databricks#spark-csv_2.10;1.2.0: not found]

-----------------