Support Questions

Find answers, ask questions, and share your expertise

spark + scala schema creattion error

avatar
Contributor

Hi All,

im trying to convert case class to StryctType schema in spark, im getting error attached in image,

please find my case class and conversion technique

case class Airlines(Airline_id: Integer, Name: String, Alias: String, IATA: String, ICAO: String, Callsign: String, Country: String, Active: String)

val AirlineSchema = ScalaReflection.schemaFor[Airlines].dataType.asInstanceOf[StructType]

Reference URL:

https://stackoverflow.com/questions/36746055/generate-a-spark-structtype-schema-from-a-case-class

Cheers,

MJ

85725-case-class-to-schema-issue.jpg

1 ACCEPTED SOLUTION

avatar

@Manikandan Jeyabal

Perhaps you can try this out:

import org.apache.spark.sql.Encoders
case class Airlines(Airline_id: Integer, Name: String, Alias: String, IATA: String, ICAO: String, Callsign: String, Country: String, Active: String)
Encoders.product[Airlines].schema

Also there are some examples of use of case class in the following example:

https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/Sp...

Let me know if this helps!

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

View solution in original post

7 REPLIES 7

avatar

@Manikandan Jeyabal

Perhaps you can try this out:

import org.apache.spark.sql.Encoders
case class Airlines(Airline_id: Integer, Name: String, Alias: String, IATA: String, ICAO: String, Callsign: String, Country: String, Active: String)
Encoders.product[Airlines].schema

Also there are some examples of use of case class in the following example:

https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/Sp...

Let me know if this helps!

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

avatar
Contributor

Hi @Felix Albani,

Tanx for your response, actully I've tried all the possible options,

please find the attached image for reference. is there any other way i can solve my issue ?

87385-case-class-to-schema-issue-1.jpg

Cheers,

MJ

avatar

@Manikandan Jeyabal

The problem perhaps could be at the project level then. Could you check your pom file and make sure you have all necessary spark depenedencies. I tried this in zeppelin ui and is working fine:

85727-screen-shot-2018-08-24-at-94033-am.png

Also make sure you clean/build and perhaps exit eclipse just in case there is something wrong with eclipse.

Finally here is a link on how to setup depenedecies eclipse:

https://community.hortonworks.com/articles/147787/how-to-setup-hortonworks-repository-for-spark-on-e...

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

avatar
Contributor

Hi @Felix Albani

just for an clarification this will work only on Hortonworks dependencies ? please find my build.sbt dependencies and let me know whether i needs to add anything

val sparkVersion = "2.2.1"val hadoopVersion = "2.7.1"val poiVersion = "3.9"val avroVersion = "1.7.6"

libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % sparkVersion, "org.apache.spark" %% "spark-sql" % sparkVersion, "org.apache.spark" %% "spark-hive" % sparkVersion, "org.apache.poi" % "poi-ooxml" % poiVersion, "org.apache.poi" % "poi" % poiVersion, "org.apache.avro" % "avro" % avroVersion )

avatar
Contributor

Hi @Felix Albani,

Still the same issue Exists, please find my attached build.sbt and sample code attached.

import sbt._ import sbt.Keys._ name := "BackupSnippets" version := "1.0" scalaVersion := "2.11.8" val sparkVersion = "2.2.1" val hadoopVersion = "2.7.1" val poiVersion = "3.9" val avroVersion = "1.7.6" val hortonworksVersion = "2.2.0.2.6.3.79-2" conflictManager := ConflictManager.latestRevision /*libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % sparkVersion, "org.apache.spark" %% "spark-sql" % sparkVersion, "org.apache.spark" %% "spark-hive" % sparkVersion, "org.apache.poi" % "poi-ooxml" % poiVersion, "org.apache.poi" % "poi" % poiVersion, "org.apache.avro" % "avro" % avroVersion )*/ /*resolvers ++= Seq( "Typesafe repository" at "https://repo.typesafe.com/typesafe/releases/", "Typesafe Ivyrepository" at "https://repo.typesafe.com/typesafe/ivy-releases/", "Maven Central" at "https://repo1.maven.org/maven2/", "Sonatype snapshots" at "https://oss.sonatype.org/content/repositories/snapshots/", Resolver.sonatypeRepo("releases") ) */ libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % hortonworksVersion, "org.apache.spark" %% "spark-sql" % hortonworksVersion, "org.apache.spark" %% "spark-hive" % hortonworksVersion) resolvers ++= Seq("Hortonworks Releases" at "http://repo.hortonworks.com/content/repositories/releases/", "Jetty Releases" at "http://repo.hortonworks.com/content/repositories/jetty-hadoop/")

************************************************************************************

package BigData101.ORC import ScalaUtils.SchemaUtils import org.apache.spark.sql.{Encoders, Row, SparkSession} import org.apache.spark.sql.catalyst.ScalaReflection import org.apache.spark.sql.types.StructType object ORCTesting { def main(args: Array[String]): Unit = { val spark: SparkSession = SparkSession.builder() .master("local[*]") .appName("ORC Testing") .enableHiveSupport() .getOrCreate() case class Airlines(Airline_id: Integer, Name: String, Alias: String, IATA: String, ICAO: String, Callsign: String, Country: String, Active: String) //val AirlineSchema = ScalaReflection.schemaFor[Airlines].dataType.asInstanceOf[StructType] Encoders.product[Airlines].schema sys.exit(1) } }

avatar

@Manikandan Jeyabal

I think there may be a project level problem on yours. Therefore I'm sharing a very simple dummy project that is compiling fine in my environment (this is only shared as an example and is not meant to be used in real production):

sparktest.zip

Hopefully with this project you will be able to figure out what is wrong when comparing with yours. Note: You can use command line sbt compile and sbt package to get the jar

This is the output I get when I run it:

 SPARK_MAJOR_VERSION=2 spark-submit /root/projects/sparktest/target/scala-2.11/hello_2.11-0.1.0-SNAPSHOT.jar Hello
SPARK_MAJOR_VERSION is set to 2, using Spark2
hello
root
 |-- Airline_id: integer (nullable = true)
 |-- Name: string (nullable = true)
 |-- Alias: string (nullable = true)
 |-- IATA: string (nullable = true)
 |-- ICAO: string (nullable = true)
 |-- Callsign: string (nullable = true)
 |-- Country: string (nullable = true)
 |-- Active: string (nullable = true)

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

avatar
Contributor

Hi @Felix Albani

yes your suggestion works fine, i think i have to extend my object as app to make it work. Tanx Bro.