Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

While i am running the readStream to read the data from kafka topic i am facing issues in spark shell

Highlighted

While i am running the readStream to read the data from kafka topic i am facing issues in spark shell

val stramingInputDf = spark.readStream.format("kafka"). | option("kafka.bootstrap.servers","********:218"). | option("startingOffsets", "earliest"). | option("subscribe","Prod_Canonical_Realtime_POS").load()

ava.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark.apache.org/third-party-projects.html at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:594) at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86) at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86) at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:197) at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:87) at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:87) at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30) at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:125) ... 51 elided Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.execution.datasources.DataSource$anonfun$25$anonfun$apply$13.apply(DataSource.scala:579) at org.apache.spark.sql.execution.datasources.DataSource$anonfun$25$anonfun$apply$13.apply(DataSource.scala:579) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.DataSource$anonfun$25.apply(DataSource.scala:579) at org.apache.spark.sql.execution.datasources.DataSource$anonfun$25.apply(DataSource.scala:579) at scala.util.Try.orElse(Try.scala:84) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:579) ... 58 more

I even tried to run this in the command to import the jar

scala> :require /home/svc_hortonworks/spark-streaming-kafka_2.10-1.2.0.jar The path '/home/svc_hortonworks/spark-streaming-kafka_2.10-1.2.0.jar' cannot be loaded, because existing classpath entries conflict.

1 REPLY 1

Re: While i am running the readStream to read the data from kafka topic i am facing issues in spark shell

Expert Contributor

Can you try something like

spark-shell --master=yarn --jars /home/<>/spark-sql-kafka-0-10_2.11-2.1.1.jar,<>/libs/kafka-clients-0.10.1.2.6.2.0-205.jar

Don't have an account?
Coming from Hortonworks? Activate your account here