Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark Streaming on IBM BigInsight

Highlighted

Spark Streaming on IBM BigInsight

New Contributor

Trying to execute Spark Streaming on IBM BigInsight but getting errors.

We tried multiple options but still error. Details provided along with code and error.

Version:

HDFS 2.7.2

YARN 2.7.2

ZooKeeper 3.4.6

Kafka 0.9.0.1

Spark 1.6.1

Jars used:

1. spark-streaming_2.10-1.6.1_IBM_4.jar

2. spark-core_2.10-1.6.1_IBM_4.jar

3. spark-assembly-1.6.1_IBM_4-hadoop2.7.2-IBM-12.jar

4. spark-yarn-shuffle-1.6.1_IBM_4-hadoop2.7.2-IBM-12.jar

5. spark-streaming-kafka_2.10-1.6.1.2.4.2.12-1.jar

Option 1: SparkStreamingExample1

Code:

System.out.println("start directKafkaStreaming_messages"); //KafkaUtils.create(jssc, String.class, String.class, kafkaParams); JavaPairInputDStream directKafkaStreaming_messages = KafkaUtils.createDirectStream( jssc, String.class, String.class, StringDecoder.class, StringDecoder.class, kafkaParams, topicsSet );

Error:

Exception in thread "main" java.lang.NoSuchMethodError: kafka.consumer.SimpleConsumer.(Ljava/lang/String;IIILjava/lang/String;Lorg/apache/kafka/common/protocol/SecurityProtocol;)V at org.apache.spark.streaming.kafka.KafkaCluster.connect(KafkaCluster.scala:56) at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:350) at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:347) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) at org.apache.spark.streaming.kafka.KafkaCluster.org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers(KafkaCluster.scala:347) at org.apache.spark.streaming.kafka.KafkaCluster.getPartitionMetadata(KafkaCluster.scala:130) at org.apache.spark.streaming.kafka.KafkaCluster.getPartitions(KafkaCluster.scala:117) at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:213) at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:486) at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:609) at org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala) at com.ey.testjar.SparkStreamingExample1.main(SparkStreamingExample1.java:114)

Option 2: SparkStreamingExample2

Code:

JavaDStream> unionStreams = ReceiverLauncher.launch(jssc, kafkaprops, numberOfReceivers, StorageLevel.MEMORY_ONLY());

Error:

17/04/23 18:47:03 ERROR StreamingContext: Error starting the context, marking it as stopped java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute at scala.Predef$.require(Predef.scala:233) at org.apache.spark.streaming.DStreamGraph.validate(DStreamGraph.scala:161) at org.apache.spark.streaming.StreamingContext.validate(StreamingContext.scala:542) at org.apache.spark.streaming.StreamingContext.liftedTree1$1(StreamingContext.scala:601) at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:600) at org.apache.spark.streaming.api.java.JavaStreamingContext.start(JavaStreamingContext.scala:624) at com.ey.testjar.SparkStreamingExample2.main(SparkStreamingExample2.java:111)

Option 3:

Code:

JavaPairReceiverInputDStream messages = (JavaPairReceiverInputDStream) KafkaUtils.createDirectStream(jssc, String.class, String.class, kafka.serializer.StringDecoder.class, kafka.serializer.StringDecoder.class, kafkaParams, topicsSet);

Error:

17/04/23 18:50:55 INFO VerifiableProperties: Property zookeeper.connect is overridden to Exception in thread "main" java.lang.NoSuchMethodError: kafka.consumer.SimpleConsumer.(Ljava/lang/String;IIILjava/lang/String;Lorg/apache/kafka/common/protocol/SecurityProtocol;)V at org.apache.spark.streaming.kafka.KafkaCluster.connect(KafkaCluster.scala:56) at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:350) at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:347) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) at org.apache.spark.streaming.kafka.KafkaCluster.org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers(KafkaCluster.scala:347) at org.apache.spark.streaming.kafka.KafkaCluster.getPartitionMetadata(KafkaCluster.scala:130) at org.apache.spark.streaming.kafka.KafkaCluster.getPartitions(KafkaCluster.scala:117) at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:213) at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:486) at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:609) at org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala) at com.ey.testjar.SparkStreaming1.main(SparkStreaming1.java:51)

4 REPLIES 4
Highlighted

Re: Spark Streaming on IBM BigInsight

Guru

Hi @Deepak Talim. Please note your jars are IBM code ... we only support open source hadoop.

Highlighted

Re: Spark Streaming on IBM BigInsight

New Contributor

Hi @Greg Keys,

Yes, we are trying to execute Spark Streaming on IBM BigInsight but getting errors.

So who can help us, can you please provide contact person details?

Highlighted

Re: Spark Streaming on IBM BigInsight

Guru

I believe you would have to contact IBM support. (We do not support their platform).

Re: Spark Streaming on IBM BigInsight

New Contributor

I am the author of the kafka-spark-consumer (https://github.com/dibbhatt/kafka-spark-consumer) . I see your have tried the same in Option 2. The error says you do not have any output operation . Can you share your code please. You need to give one output operation in your logic for streaming job to start.

Don't have an account?
Coming from Hortonworks? Activate your account here