- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
saveAsOrcFile is not a member of org.apache.spark.sql.DataFrame
- Labels:
-
Apache Oozie
-
Apache Spark
Created ‎02-12-2016 05:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When I try Tutorial "A Lap around Apache Spark 1.3.1 with HDP 2.3" from sandbox, I encountered the problem:
scala> peopleSchemaRDD.saveAsOrcFile("people.orc")
<console>:41: error: value saveAsOrcFile is not a member of org.apache.spark.sql.DataFrame peopleSchemaRDD.saveAsOrcFile("people.orc") ^
Created ‎02-12-2016 05:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
. @wei yang Are you using Spark 1.3.1 or just the content of the tutorial? ORC support was added in Spark 1.4 (http://hortonworks.com/blog/bringing-orc-support-into-apache-spark/)
Try using the following command
myDataFrame.write.format("orc").save("some_name")
Created ‎02-12-2016 05:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you sure that you are using the new sandbox and the Spark version is actually 1.3.1 or higher? It sounds like an error you would get in Spark 1.2
Created ‎02-12-2016 05:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
. @wei yang Are you using Spark 1.3.1 or just the content of the tutorial? ORC support was added in Spark 1.4 (http://hortonworks.com/blog/bringing-orc-support-into-apache-spark/)
Try using the following command
myDataFrame.write.format("orc").save("some_name")
Created ‎02-13-2016 08:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm using Spark 1.4.1, and the command "peopleSchemaRDD.write.format("orc").save("people.orc")" works !!!
Thank you very much !
Created ‎02-12-2016 08:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
sc.parallelize(records).toDF().write.format("orc").save("people")
that method was refactored. There's a new way of writing ORC files. Convert your RDD to DataFrame with toDF() and then write it out as above.Try to use later versions of Spark.
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_spark-guide/content/ch_orc-spark.html
