Created 01-13-2018 08:51 PM
Hi,
I'm getting the below error while processing an XML file using Spark. Don't know what I am doing wrong here. Any suggestion to resolve this will be greatly apreciated -
spark-submit --class csvdf /CSVDF/target/scala-2.11/misc-test_2.11-1.0.jar
Error:
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark.apache.org/third-party-projects.html at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:594) at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86) at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135) at csvdf$.main(csvdf.scala:45) at csvdf.main(csvdf.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:743) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: com.databricks.spark.xml.DefaultSource at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.execution.datasources.DataSource$anonfun$25$anonfun$apply$13.apply(DataSource.scala:579) at org.apache.spark.sql.execution.datasources.DataSource$anonfun$25$anonfun$apply$13.apply(DataSource.scala:579) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.DataSource$anonfun$25.apply(DataSource.scala:579) at org.apache.spark.sql.execution.datasources.DataSource$anonfun$25.apply(DataSource.scala:579) at scala.util.Try.orElse(Try.scala:84) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:579) ... 16 more |
Source Code:
import org.apache.spark.sql.SQLContext
import com.databricks.spark.xml._ . . . val sConf = new SparkConf().setAppName("Hive test").setMaster("local") val sc = new SparkContext(sConf) val warehouseLocation = new File("spark-warehouse").getAbsolutePath val spark = SparkSession .builder() .appName("Hive test") .config("spark.sql.warehouse.dir", warehouseLocation) .enableHiveSupport() .getOrCreate() import spark.implicits._ import spark.sql //* Test XML input file val xml_df = spark.read .format("com.databricks.spark.xml") .option("rowTag", "doc") .load("file:///Downloads/sample.xml") xml_df.printSchema() xml_df.createOrReplaceTempView("XML_DATA") spark.sql("SELECT * FROM XML_DATA").show() |
SBT file:
name := "MISC test"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.1"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.1.1"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.1.1"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.1" % "provided"
libraryDependencies += "com.databricks" %% "spark-xml" % "0.4.1" % "provided"
I don't know what I am doing wrong. Any suggestions to resolve this will be a great help.
Many Thanks
Satya
Created 03-05-2018 02:48 PM
Have you tried with --package option (like --packages com.databricks:spark-xml_2.11:0.4.1)
spark-submit \ --packages com.databricks:spark-xml_2.11:0.4.1 \ --class csvdf \ /CSVDF/target/scala-2.11/misc-test_2.11-1.0.jar