<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question json file input path for loading into spark in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/json-file-input-path-for-loading-into-spark/m-p/149145#M28538</link>
    <description>&lt;P&gt;hi - i am trying to load my json file using spark and cannot seem to do it correctly. the path at the end of this bit of scala. the file is located on my sandbox in the tmp folder. i've tried:&lt;/P&gt;&lt;P&gt;val df2 = sqlContext.read.format("json").option("samplingRatio", "1.0").load("/tmp/rawpanda.json")&lt;/P&gt;&lt;P&gt;any help would be great thanks.&lt;/P&gt;&lt;P&gt;mark&lt;/P&gt;</description>
    <pubDate>Tue, 17 May 2016 04:05:07 GMT</pubDate>
    <dc:creator>mesteph6</dc:creator>
    <dc:date>2016-05-17T04:05:07Z</dc:date>
    <item>
      <title>json file input path for loading into spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/json-file-input-path-for-loading-into-spark/m-p/149145#M28538</link>
      <description>&lt;P&gt;hi - i am trying to load my json file using spark and cannot seem to do it correctly. the path at the end of this bit of scala. the file is located on my sandbox in the tmp folder. i've tried:&lt;/P&gt;&lt;P&gt;val df2 = sqlContext.read.format("json").option("samplingRatio", "1.0").load("/tmp/rawpanda.json")&lt;/P&gt;&lt;P&gt;any help would be great thanks.&lt;/P&gt;&lt;P&gt;mark&lt;/P&gt;</description>
      <pubDate>Tue, 17 May 2016 04:05:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/json-file-input-path-for-loading-into-spark/m-p/149145#M28538</guid>
      <dc:creator>mesteph6</dc:creator>
      <dc:date>2016-05-17T04:05:07Z</dc:date>
    </item>
    <item>
      <title>Re: json file input path for loading into spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/json-file-input-path-for-loading-into-spark/m-p/149146#M28539</link>
      <description>&lt;P&gt;Not sure of what error you are getting (feel free to share some of the dataset and the error messages you received), but I'm wondering if you are accounting for the following warning called out in &lt;A href="http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets" target="_blank"&gt;http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets&lt;/A&gt;.&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;&lt;EM&gt;Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. As a consequence, a regular multi-line JSON file will most often fail.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;I usually get something like the following when trying to use a multi-line file.&lt;/P&gt;&lt;PRE&gt;scala&amp;gt; val productsML = sqlContext.read.json("/tmp/hcc/products.json")
productsML: org.apache.spark.sql.DataFrame = [_corrupt_record: string]&lt;/PRE&gt;&lt;P&gt;That said, all seems to be working for me with a file like the following.&lt;/P&gt;&lt;PRE&gt;[root@sandbox ~]# hdfs dfs -cat /tmp/hcc/employees.json
{"id" : "1201", "name" : "satish", "age" : "25"}
{"id" : "1202", "name" : "krishna", "age" : "28"}
{"id" : "1203", "name" : "amith", "age" : "39"}
{"id" : "1204", "name" : "javed", "age" : "23"}
{"id" : "1205", "name" : "prudvi", "age" : "23"}&lt;/PRE&gt;&lt;P&gt;As you can see by the two ways I read the JSON file below.&lt;/P&gt;&lt;PRE&gt;SQL context available as sqlContext.
scala&amp;gt; val df1 = sqlContext.read.json("/tmp/hcc/employees.json")
df1: org.apache.spark.sql.DataFrame = [age: string, id: string, name: string]
scala&amp;gt; df1.printSchema()
root
 |-- age: string (nullable = true)
 |-- id: string (nullable = true)
 |-- name: string (nullable = true)
scala&amp;gt; df1.show()
+---+----+-------+
|age|  id|   name|
+---+----+-------+
| 25|1201| satish|
| 28|1202|krishna|
| 39|1203|  amith|
| 23|1204|  javed|
| 23|1205| prudvi|
+---+----+-------+
scala&amp;gt; val df2 = sqlContext.read.format("json").option("samplingRatio", "1.0").load("/tmp/hcc/employees.json")
df2: org.apache.spark.sql.DataFrame = [age: string, id: string, name: string]
scala&amp;gt; df2.show()
+---+----+-------+
|age|  id|   name|
+---+----+-------+
| 25|1201| satish|
| 28|1202|krishna|
| 39|1203|  amith|
| 23|1204|  javed|
| 23|1205| prudvi|
+---+----+-------+&lt;/PRE&gt;&lt;P&gt;Again, if this doesn't help feel free to share some more details.  Good luck!&lt;/P&gt;</description>
      <pubDate>Tue, 17 May 2016 07:58:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/json-file-input-path-for-loading-into-spark/m-p/149146#M28539</guid>
      <dc:creator>LesterMartin</dc:creator>
      <dc:date>2016-05-17T07:58:11Z</dc:date>
    </item>
    <item>
      <title>Re: json file input path for loading into spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/json-file-input-path-for-loading-into-spark/m-p/149147#M28540</link>
      <description>&lt;P&gt;Looks like same question over at &lt;A href="https://community.hortonworks.com/questions/33621/input-path-on-sandbox-for-loading-data-into-spark.html" target="_blank"&gt;https://community.hortonworks.com/questions/33621/input-path-on-sandbox-for-loading-data-into-spark.html&lt;/A&gt; that &lt;A rel="user" href="https://community.cloudera.com/users/472/jwiden.html" nodeid="472"&gt;@Joe Widen&lt;/A&gt; answered.  Note, my comment (and example) below that Joe also pointed out about the JSON object needing to be on a single line.  Glad to see Joe got a "best answer" and I'd sure be appreciative for the same on this one.  &lt;span class="lia-unicode-emoji" title=":winking_face:"&gt;😉&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 19 May 2016 05:06:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/json-file-input-path-for-loading-into-spark/m-p/149147#M28540</guid>
      <dc:creator>LesterMartin</dc:creator>
      <dc:date>2016-05-19T05:06:49Z</dc:date>
    </item>
  </channel>
</rss>

