Member since
08-06-2013
12
Posts
5
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
54364 | 05-07-2015 09:04 AM | |
28458 | 04-08-2015 04:39 PM |
02-26-2016
12:19 PM
Hey Craig- Spark's HiveContext requires the use of *some* metastore. In this case, since you're not specifying one, it's creating the default, file-based metastore_db. Here's some more details: https://github.com/apache/spark/blob/99dfcedbfd4c83c7b6a343456f03e8c6e29968c5/examples/src/main/scala/org/apache/spark/examples/sql/hive/HiveFromSpark.scala#L42 http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables Few options: 1) make sure the location is writable by your Spark processes 2) configure the hive-site.xml to place the file in a diff location 3) move to MySQL or equivalent for true metastore functionality (might be needed elsewhere)
... View more
05-07-2015
09:04 AM
4 Kudos
Hey Siva- This is Chris Fregly from Databricks. I just talked to my co-worker, Michael Armbrust (Spark SQL, Catalyst, DataFrame guru), and we came up with the code sample below. Hopefully, this is what you're looking for. Michael admits that this is a bit verbose, so he may implement a more condense `explodeArray()` method on DataFrame at some point. case class Employee(firstName: String, lastName: String, email: String)
case class Department(id: String, name: String)
case class DepartmentWithEmployees(department: Department, employees: Seq[Employee])
val employee1 = new Employee("michael", "armbrust", "abc123@prodigy.net")
val employee2 = new Employee("chris", "fregly", "def456@compuserve.net")
val department1 = new Department("123456", "Engineering")
val department2 = new Department("123456", "Psychology")
val departmentWithEmployees1 = new DepartmentWithEmployees(department1, Seq(employee1, employee2))
val departmentWithEmployees2 = new DepartmentWithEmployees(department2, Seq(employee1, employee2))
val departmentWithEmployeesRDD = sc.parallelize(Seq(departmentWithEmployees1, departmentWithEmployees2))
departmentWithEmployeesRDD.toDF().saveAsParquetFile("dwe.parquet")
val departmentWithEmployeesDF = sqlContext.parquetFile("dwe.parquet")
// This would be replaced by explodeArray()
val explodedDepartmentWithEmployeesDF = departmentWithEmployeesDF.explode(departmentWithEmployeesDF("employees")) {
case Row(employee: Seq[Row]) => employee.map(employee =>
Employee(employee(0).asInstanceOf[String], employee(1).asInstanceOf[String], employee(2).asInstanceOf[String])
)
}
... View more
04-08-2015
04:39 PM
1 Kudo
you're looking for LATERAL VIEW EXPLODE: http://apache-spark-user-list.1001560.n3.nabble.com/flattening-a-list-in-spark-sql-td13300.html
... View more