About chrisf

chrisf · ‎02-26-2016

Hey Craig- Spark's HiveContext requires the use of *some* metastore. In this case, since you're not specifying one, it's creating the default, file-based metastore_db. Here's some more details: https://github.com/apache/spark/blob/99dfcedbfd4c83c7b6a343456f03e8c6e29968c5/examples/src/main/scala/org/apache/spark/examples/sql/hive/HiveFromSpark.scala#L42 http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables Few options: 1) make sure the location is writable by your Spark processes 2) configure the hive-site.xml to place the file in a diff location 3) move to MySQL or equivalent for true metastore functionality (might be needed elsewhere)

chrisf · ‎05-07-2015

Hey Siva- This is Chris Fregly from Databricks. I just talked to my co-worker, Michael Armbrust (Spark SQL, Catalyst, DataFrame guru), and we came up with the code sample below. Hopefully, this is what you're looking for. Michael admits that this is a bit verbose, so he may implement a more condense `explodeArray()` method on DataFrame at some point. case class Employee(firstName: String, lastName: String, email: String) case class Department(id: String, name: String) case class DepartmentWithEmployees(department: Department, employees: Seq[Employee]) val employee1 = new Employee("michael", "armbrust", "abc123@prodigy.net") val employee2 = new Employee("chris", "fregly", "def456@compuserve.net") val department1 = new Department("123456", "Engineering") val department2 = new Department("123456", "Psychology") val departmentWithEmployees1 = new DepartmentWithEmployees(department1, Seq(employee1, employee2)) val departmentWithEmployees2 = new DepartmentWithEmployees(department2, Seq(employee1, employee2)) val departmentWithEmployeesRDD = sc.parallelize(Seq(departmentWithEmployees1, departmentWithEmployees2)) departmentWithEmployeesRDD.toDF().saveAsParquetFile("dwe.parquet") val departmentWithEmployeesDF = sqlContext.parquetFile("dwe.parquet") // This would be replaced by explodeArray() val explodedDepartmentWithEmployeesDF = departmentWithEmployeesDF.explode(departmentWithEmployeesDF("employees")) { case Row(employee: Seq[Row]) => employee.map(employee => Employee(employee(0).asInstanceOf[String], employee(1).asInstanceOf[String], employee(2).asInstanceOf[String]) ) }

chrisf · ‎04-08-2015

you're looking for LATERAL VIEW EXPLODE: http://apache-spark-user-list.1001560.n3.nabble.com/flattening-a-list-in-spark-sql-td13300.html

Online	Offline
Last Visited	‎02-26-2016 03:41 PM

Member Since	‎08-06-2013 02:19 PM
Last Visited	‎02-26-2016 03:41 PM
Posts	12
Kudos received	5

Cloudera Community

Re: Explode function in Data Frames

Re: Spark SQL JSON array querry ?

Re: Spark displays SQLException when Hive not inst...

Re: Explode function in Data Frames

Re: Spark SQL JSON array querry ?