Created on 12-08-2016 10:24 PM - edited 09-16-2022 03:50 AM
java.lang.AssertionError: assertion failed: The ORC data source can only be used with HiveContext
I tried below mentioned alternatives but none of them worked.
sampleData.write().mode(SaveMode.Append).format("orc").save("/tmp/my_orc"); sampleData.write().mode(SaveMode.Append).saveAsTable("MFRTable")
I am using Spark 1.6.1.2.4.2.12-1, and has below mentioned dependency added in my project
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>1.6.1.2.4.2.12-1</version> <scope>provided</scope> </dependency>
but then i am not able to find HiveContext object. Wondering which version should i be using ?
Thanks.
Created 12-09-2016 10:18 PM
Create some properties in your pom.xml:
<properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <scala.core>2.10</scala.core> <spark.version>1.6.1</spark.version> </properties>
Include spark-hive in your project's dependencies:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.core}</artifactId>
<version>${spark.version}</version>
</dependency>
Then in your code:
// create a new hive context from the spark context
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sparkContext)
// create the data frame and write it to orc
// output will be a directory of orc files
val df = hiveContext.createDataFrame(rdd)
df.write.mode(SaveMode.Overwrite).format("orc")
.save("/tmp/myapp.orc/")
Created 12-10-2016 04:52 AM
@Akhil Bansal could you please try this
https://community.hortonworks.com/repos/62212/sparkorcwriter.html
Created 04-23-2017 09:46 PM
I already have a DF that I want to save in orc format. in your solution it is expecting a RDD. when I tried,
val df = sqlContext.createDataFrame(results.rdd)
it gave me an error saying,
[A <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$1: reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame cannot be applied to (org.apache.spark.rdd.RDD[org.apache.spark.sql.Row])
Created 06-18-2018 11:39 PM
@Kit Menke isn't wrong.
Take a look at the API docs. You'll notice there are several options for creating data frames from an RDD. In your case; it looks as though you have an RDD of class type Row; so you'll need to also provide a schema to the createDataFrame() method.
Scala API docs: https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.sql.SQLContext
import org.apache.spark.sql._
import org.apache.spark.sql.types._
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val schema =
StructType(
StructField("name", StringType, false) ::
StructField("age", IntegerType, true) :: Nil)
val people =
sc.textFile("examples/src/main/resources/people.txt").map(
_.split(",")).map(p => Row(p(0), p(1).trim.toInt))
val dataFrame = sqlContext.createDataFrame(people, schema)
dataFrame.printSchema
// root
// |-- name: string (nullable = false)
// |-- age: integer (nullable = true)
dataFrame.createOrReplaceTempView("people")
sqlContext.sql("select name from people").collect.foreach(println)