Created 02-12-2016 05:28 PM
When I try Tutorial "A Lap around Apache Spark 1.3.1 with HDP 2.3" from sandbox, I encountered the problem:
scala> peopleSchemaRDD.saveAsOrcFile("people.orc")
<console>:41: error: value saveAsOrcFile is not a member of org.apache.spark.sql.DataFrame peopleSchemaRDD.saveAsOrcFile("people.orc") ^
Created 02-12-2016 05:43 PM
. @wei yang Are you using Spark 1.3.1 or just the content of the tutorial? ORC support was added in Spark 1.4 (http://hortonworks.com/blog/bringing-orc-support-into-apache-spark/)
Try using the following command
myDataFrame.write.format("orc").save("some_name")
Created 02-12-2016 05:39 PM
Are you sure that you are using the new sandbox and the Spark version is actually 1.3.1 or higher? It sounds like an error you would get in Spark 1.2
Created 02-12-2016 05:43 PM
. @wei yang Are you using Spark 1.3.1 or just the content of the tutorial? ORC support was added in Spark 1.4 (http://hortonworks.com/blog/bringing-orc-support-into-apache-spark/)
Try using the following command
myDataFrame.write.format("orc").save("some_name")
Created 02-13-2016 08:01 PM
I'm using Spark 1.4.1, and the command "peopleSchemaRDD.write.format("orc").save("people.orc")" works !!!
Thank you very much !
Created 02-12-2016 08:14 PM
sc.parallelize(records).toDF().write.format("orc").save("people")
that method was refactored. There's a new way of writing ORC files. Convert your RDD to DataFrame with toDF() and then write it out as above.Try to use later versions of Spark.
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_spark-guide/content/ch_orc-spark.html