Created on 12-08-2016 10:24 PM - edited 09-16-2022 03:50 AM
java.lang.AssertionError: assertion failed: The ORC data source can only be used with HiveContext
I tried below mentioned alternatives but none of them worked.
sampleData.write().mode(SaveMode.Append).format("orc").save("/tmp/my_orc"); sampleData.write().mode(SaveMode.Append).saveAsTable("MFRTable")
I am using Spark 1.6.1.2.4.2.12-1, and has below mentioned dependency added in my project
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>1.6.1.2.4.2.12-1</version> <scope>provided</scope> </dependency>
but then i am not able to find HiveContext object. Wondering which version should i be using ?
Thanks.
Created 12-09-2016 10:18 PM
Create some properties in your pom.xml:
<properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <scala.core>2.10</scala.core> <spark.version>1.6.1</spark.version> </properties>
Include spark-hive in your project's dependencies:
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_${scala.core}</artifactId> <version>${spark.version}</version> </dependency>
Then in your code:
// create a new hive context from the spark context val hiveContext = new org.apache.spark.sql.hive.HiveContext(sparkContext) // create the data frame and write it to orc // output will be a directory of orc files val df = hiveContext.createDataFrame(rdd) df.write.mode(SaveMode.Overwrite).format("orc") .save("/tmp/myapp.orc/")
Created 12-10-2016 04:52 AM
@Akhil Bansal could you please try this
https://community.hortonworks.com/repos/62212/sparkorcwriter.html
Created 04-23-2017 09:46 PM
I already have a DF that I want to save in orc format. in your solution it is expecting a RDD. when I tried,
val df = sqlContext.createDataFrame(results.rdd)
it gave me an error saying,
[A <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$1: reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame cannot be applied to (org.apache.spark.rdd.RDD[org.apache.spark.sql.Row])
Created 06-18-2018 11:39 PM
@Kit Menke isn't wrong.
Take a look at the API docs. You'll notice there are several options for creating data frames from an RDD. In your case; it looks as though you have an RDD of class type Row; so you'll need to also provide a schema to the createDataFrame() method.
Scala API docs: https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.sql.SQLContext
import org.apache.spark.sql._ import org.apache.spark.sql.types._ val sqlContext = new org.apache.spark.sql.SQLContext(sc) val schema = StructType( StructField("name", StringType, false) :: StructField("age", IntegerType, true) :: Nil) val people = sc.textFile("examples/src/main/resources/people.txt").map( _.split(",")).map(p => Row(p(0), p(1).trim.toInt)) val dataFrame = sqlContext.createDataFrame(people, schema) dataFrame.printSchema // root // |-- name: string (nullable = false) // |-- age: integer (nullable = true) dataFrame.createOrReplaceTempView("people") sqlContext.sql("select name from people").collect.foreach(println)