Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Unable to load spark-csv package

avatar
New Contributor

I am using Cloudera Quickstart VM 5.4.2.0 for online training. For one particular task I need to load spark-csv package so I can read csv files into pyspark for practice. However, I am encounting problems. 

 

First, I ran PYSPARK_DRIVER_PYTHON=ipython pyspark -- packages com.databricks:spark-csv_2.10:1.3.0

 

it seems working fine, but I got a warning message saying: util.NativeCodeLoader: unable to load native-hadoop library for your platform... using built-java classes where applicable. 

 

Then I tried the spark code to import csv:

yelp_df = sqlCtx.load( source="com.databricks.spark.csv", header = 'true', inferSchema = 'true', path = 'file:///usr/lib/hue/apps/search/examples/collections/solr_co nfigs_yelp_demo/index_data.csv')

 

but I am getting an error saying "Py4JJavaError: An error ocurred while calling o19.load. : java.lang.RuntimeException: Failed to load class for source: com.databricks.spark.csv

 

Does anyone know how can I fix this? Thanks a lot!

Who agreed with this topic