Support Questions

dsandoval · ‎05-22-2018

Hi, I'm doing the following tutoria.

https://es.hortonworks.com/tutorial/deploying-machine-learning-models-using-spark-structured-streami...

I'm using the HDP version is HDP-2.5.0.0-1245, the spark version is 1.6.2 and the scala version is 2.10.5.

I have reached this point of the tutorial: Then use spark-submit to deploy the jar to Spark.

I am trying to submit a job which is in target/main/scala, which is a jar file, with the following lines:

/usr/hdp/current/spark2-client/bin/spark-submit --class "main.scala.Collect" --master local[4] ./SentimentAnalysis-assembly-2.0.0.jar

All goes well, except the following errors:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark-packages.org
        at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:145)
        at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:78)
        at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:78)
        at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:195)
        at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:79)
        at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:79)
        at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
        at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:142)
        at main.scala.Collect$.main(Collect.scala:61)
        at main.scala.Collect.main(Collect.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:130)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:130)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:130)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:130)
        at scala.util.Try.orElse(Try.scala:84)
        at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:130)
        ... 18 more

My build.sbt file is:

name := "SentimentAnalysis"

version := "2.0.0"

scalaVersion := "2.10.5"//"2.10.4"//

libraryDependencies ++= {
  val sparkVer = "2.1.0"//"1.6.1"//
  Seq(
    "org.apache.spark"     %% "spark-core"              % sparkVer % "provided" withSources(),
    "org.apache.spark"     %% "spark-mllib"             % sparkVer % "provided" withSources(),
    "org.apache.spark"     %% "spark-sql"               % sparkVer withSources(),
    "org.apache.spark"     %% "spark-streaming"         % sparkVer % "provided" withSources(),
    "org.apache.spark"     %% "spark-streaming-kafka-0-10" % sparkVer withSources(),
    "org.apache.spark"     %% "spark-sql-kafka-0-10" % sparkVer withSources(),
    "org.apache.kafka"     %% "kafka" % "0.10.0" withSources(),
    "com.typesafe" % "config" % "1.3.1",
    "com.google.code.gson" % "gson" % "2.8.0"
  )
}


assemblyMergeStrategy in assembly := {
  case PathList("org", "apache", xs @ _*)      => MergeStrategy.first
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first
  case PathList("javax", "xml", xs @ _*)      => MergeStrategy.first
  case PathList("com", "esotericsoftware", xs @ _*)      => MergeStrategy.first
  case PathList("com", "google", xs @ _*)      => MergeStrategy.first
  case x =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    oldStrategy(x)
}

falbani · ‎05-22-2018

@David Sandoval The code from the link you shared is using Spark 2 and is streaming from Kafka using Structured Streaming. Structured Streaming is TP since HDP 2.6.3. You are running HDP 2.5 with spark 1.6, you should try following link steps using 2.6.3 onwards with spark 2.

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

Cloudera Community

Support Questions

Spark-submit to deploy the jar to Spark - java.lang.ClassNotFoundException

How to Simplify Spark-Submit JAR Dependency Manage...

How to Submit Spark Application through Livy REST ...

Spark-submit Options --jar, --spark-driver-classpa...

Submitting Spark Jobs From Apache NiFi Using Livy

spark-submit exception

NiFi submitting Spark jobs in batch mode

What dependencies to submit Spark jobs programmati...

Spark submit multiple configurations

java.lang.ClassNotFoundException: org.apache.nifi....

spark-shell vs spark-submit command