@prsingh
You need to pass databricks csv dependencies, either you need to download the jar or pass dependencies at run time.
1) download the dependency at run time
pyspark --packages com.databricks:spark-csv_2.10:1.2.0
df = sqlContext.read.load('file:///root/file.csv',format='com.databricks.spark.csv',header='true',inferSchema='true') or
2) pass the jars while starting
a) downloaded the jars as follow:
wget http://search.maven.org/remotecontent?filepath=org/apache/commons/commons-csv/1.1/commons-csv-1.1.ja... -O commons-csv-1.1.jar
wget http://search.maven.org/remotecontent?filepath=com/databricks/spark-csv_2.10/1.0.0/spark-csv_2.10-1.... -O spark-csv_2.10-1.0.0.jar
b) then start the python spark shell with the arguments:
./bin/pyspark --jars "spark-csv_2.10-1.0.0.jar,commons-csv-1.1.jar"
c) load as dataframe
df = sqlContext.read.load('file:///root/file.csv',format='com.databricks.spark.csv',header='true',inferSchema='true') Let me know if above helps!