Member since
01-13-2018
1
Post
0
Kudos Received
0
Solutions
01-13-2018
08:51 PM
Hi, I'm getting the below error while processing an XML file using Spark. Don't know what I am doing wrong here. Any suggestion to resolve this will be greatly apreciated - spark-submit --class csvdf /CSVDF/target/scala-2.11/misc-test_2.11-1.0.jar Error: Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:594)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at csvdf$.main(csvdf.scala:45)
at csvdf.main(csvdf.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:743)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.databricks.spark.xml.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$anonfun$25$anonfun$apply$13.apply(DataSource.scala:579)
at org.apache.spark.sql.execution.datasources.DataSource$anonfun$25$anonfun$apply$13.apply(DataSource.scala:579)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$anonfun$25.apply(DataSource.scala:579)
at org.apache.spark.sql.execution.datasources.DataSource$anonfun$25.apply(DataSource.scala:579)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:579)
... 16 more Source Code: import org.apache.spark.sql.SQLContext
import com.databricks.spark.xml._ . . . val sConf = new SparkConf().setAppName("Hive test").setMaster("local")
val sc = new SparkContext(sConf)
val warehouseLocation = new File("spark-warehouse").getAbsolutePath
val spark = SparkSession
.builder()
.appName("Hive test")
.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate()
import spark.implicits._
import spark.sql
//* Test XML input file
val xml_df = spark.read
.format("com.databricks.spark.xml")
.option("rowTag", "doc")
.load("file:///Downloads/sample.xml")
xml_df.printSchema()
xml_df.createOrReplaceTempView("XML_DATA")
spark.sql("SELECT * FROM XML_DATA").show() SBT file: name := "MISC test" version := "1.0" scalaVersion := "2.11.8" libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.1" libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.1.1" libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.1.1" libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.1" % "provided" libraryDependencies += "com.databricks" %% "spark-xml" % "0.4.1" % "provided" I don't know what I am doing wrong. Any suggestions to resolve this will be a great help. Many Thanks Satya
... View more
Labels:
- Labels:
-
Apache Spark