<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Pysprak issue in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pysprak-issue/m-p/227188#M63836</link>
    <description>&lt;P&gt;I am trying to load csv file to pyspark through the below query.&lt;/P&gt;&lt;PRE&gt;sample = sqlContext.load(source="com.databricks.spark.csv", path = '/tmp/test/20170516.csv', header = True,inferSchema = True) &lt;/PRE&gt;&lt;P&gt;But I am getting a error saying &lt;/P&gt;&lt;PRE&gt;py4j.protocol.Py4JJavaError: An error occurred while calling o137.load. 
: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv. Please find packages at &lt;A href="http://spark-packages.org/" target="_blank"&gt;http://spark-packages.org&lt;/A&gt;&lt;/PRE&gt;</description>
    <pubDate>Fri, 16 Sep 2022 11:50:55 GMT</pubDate>
    <dc:creator>prsingh1</dc:creator>
    <dc:date>2022-09-16T11:50:55Z</dc:date>
    <item>
      <title>Pysprak issue</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pysprak-issue/m-p/227188#M63836</link>
      <description>&lt;P&gt;I am trying to load csv file to pyspark through the below query.&lt;/P&gt;&lt;PRE&gt;sample = sqlContext.load(source="com.databricks.spark.csv", path = '/tmp/test/20170516.csv', header = True,inferSchema = True) &lt;/PRE&gt;&lt;P&gt;But I am getting a error saying &lt;/P&gt;&lt;PRE&gt;py4j.protocol.Py4JJavaError: An error occurred while calling o137.load. 
: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv. Please find packages at &lt;A href="http://spark-packages.org/" target="_blank"&gt;http://spark-packages.org&lt;/A&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 16 Sep 2022 11:50:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pysprak-issue/m-p/227188#M63836</guid>
      <dc:creator>prsingh1</dc:creator>
      <dc:date>2022-09-16T11:50:55Z</dc:date>
    </item>
    <item>
      <title>Re: Pysprak issue</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pysprak-issue/m-p/227189#M63837</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/10323/prsingh.html" nodeid="10323"&gt;@prsingh&lt;/A&gt; &lt;/P&gt;&lt;P&gt;You need to pass databricks csv dependencies,  either you need to download the jar or pass dependencies at run time. &lt;/P&gt;&lt;P&gt;1) download the dependency at run time&lt;/P&gt;&lt;PRE&gt;pyspark --packages com.databricks:spark-csv_2.10:1.2.0 
df = sqlContext.read.load('file:///root/file.csv',format='com.databricks.spark.csv',header='true',inferSchema='true') &lt;/PRE&gt;&lt;P&gt;or &lt;/P&gt;&lt;P&gt;2) pass the jars while starting &lt;/P&gt;&lt;P&gt;a) downloaded the jars as follow: &lt;/P&gt;&lt;PRE&gt;wget &lt;A href="http://search.maven.org/remotecontent?filepath=org/apache/commons/commons-csv/1.1/commons-csv-1.1.jar" target="_blank"&gt;http://search.maven.org/remotecontent?filepath=org/apache/commons/commons-csv/1.1/commons-csv-1.1.jar&lt;/A&gt; -O commons-csv-1.1.jar 
wget &lt;A href="http://search.maven.org/remotecontent?filepath=com/databricks/spark-csv_2.10/1.0.0/spark-csv_2.10-1.0.0.jar" target="_blank"&gt;http://search.maven.org/remotecontent?filepath=com/databricks/spark-csv_2.10/1.0.0/spark-csv_2.10-1.0.0.jar&lt;/A&gt; -O spark-csv_2.10-1.0.0.jar &lt;/PRE&gt;&lt;P&gt;b) then start the python spark shell with the arguments: &lt;/P&gt;&lt;PRE&gt;./bin/pyspark --jars "spark-csv_2.10-1.0.0.jar,commons-csv-1.1.jar" &lt;/PRE&gt;&lt;P&gt;c) load as dataframe&lt;/P&gt;&lt;PRE&gt;df = sqlContext.read.load('file:///root/file.csv',format='com.databricks.spark.csv',header='true',inferSchema='true') &lt;/PRE&gt;&lt;P&gt;Let me know if above helps!&lt;/P&gt;</description>
      <pubDate>Wed, 28 Jun 2017 15:45:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pysprak-issue/m-p/227189#M63837</guid>
      <dc:creator>nyadav</dc:creator>
      <dc:date>2017-06-28T15:45:49Z</dc:date>
    </item>
  </channel>
</rss>

