Created 02-12-2019 12:13 PM
@Jeremy Beard wrote:
Ok yes so the wildcard is expanding out to both jar filenames, and so the
second argument becomes the second jar and not "traffic.conf". Instead you
just specify the one jar filename.
Okay. Thanks
Makins some progress with you help. i am able to move to next issue
I am running Kafka_2.11-0.10.1.1
C:\source\envelope\examples\target>spark-submit2 envelope-0.6.0-shaded.jar traffic.conf
Exception in thread "main" java.lang.RuntimeException: Kafka input offset management not currently compatible with stream windowing.
at com.cloudera.labs.envelope.kafka.KafkaInput.configure(KafkaInput.java:102)
at com.cloudera.labs.envelope.input.InputFactory.create(InputFactory.java:39)
at com.cloudera.labs.envelope.run.DataStep.getInput(DataStep.java:166)
at com.cloudera.labs.envelope.run.DataStep.getComponents(DataStep.java:357)
at com.cloudera.labs.envelope.run.StreamingStep.getComponents(StreamingStep.java:122)
at com.cloudera.labs.envelope.run.Runner.initializeSecurity(Runner.java:481)
at com.cloudera.labs.envelope.run.Runner.run(Runner.java:99)
at com.cloudera.labs.envelope.EnvelopeMain.main(EnvelopeMain.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Created 02-12-2019 12:19 PM
Created 02-12-2019 12:27 PM
hi
Thanks again. Some more progress and new issue.
C:\source\envelope\examples\target>spark-submit2 envelope-0.6.0-shaded.jar traffic.conf
Exception in thread "main" java.lang.NullPointerException: Supplied TokenProvider cannot be null
at envelope.shaded.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:229)
at com.cloudera.labs.envelope.security.TokenStoreManager.addTokenProvider(TokenStoreManager.java:80)
at com.cloudera.labs.envelope.run.Runner.initializeSecurity(Runner.java:491)
at com.cloudera.labs.envelope.run.Runner.run(Runner.java:99)
at com.cloudera.labs.envelope.EnvelopeMain.main(EnvelopeMain.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Thanks
Created 02-12-2019 12:36 PM
Created 02-13-2019 09:38 AM
hi Jeremy
Please see below the error i am getting when i run the traffic pipeline with new jar.
C:\source\envelope-0.6.1\examples\traffic>spark-submit2 envelope-0.6.1.jar traffic.conf
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 0.0 (TID 0) had a not serializable result: org.apache.kaf
.clients.consumer.ConsumerRecord
Serialization stack:
- object not serializable (class: org.apache.kafka.clients.consumer.ConsumerRecord, value: ConsumerRecord(topic = traffic, partition = 0, offset
6294, CreateTime = 1550079184958, checksum = 1812888233, serialized key size = -1, serialized value size = 16, key = null, value = 1550079184958,74))
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:926)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:924)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:924)
at org.apache.spark.api.java.JavaRDDLike$class.foreachPartition(JavaRDDLike.scala:219)
at org.apache.spark.api.java.AbstractJavaRDDLike.foreachPartition(JavaRDDLike.scala:45)
at com.cloudera.labs.envelope.kafka.KafkaOutput.applyBulkMutations(KafkaOutput.java:70)
at com.cloudera.labs.envelope.run.DataStep.writeOutput(DataStep.java:419)
at com.cloudera.labs.envelope.run.DataStep.setData(DataStep.java:129)
at com.cloudera.labs.envelope.run.BatchStep.submit(BatchStep.java:83)
at com.cloudera.labs.envelope.run.Runner$2.call(Runner.java:372)
at com.cloudera.labs.envelope.run.Runner$2.call(Runner.java:369)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Exception in thread "main" java.util.concurrent.ExecutionException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage
0 (TID 0) had a not serializable result: org.apache.kafka.clients.consumer.ConsumerRecord
Serialization stack:
- object not serializable (class: org.apache.kafka.clients.consumer.ConsumerRecord, value: ConsumerRecord(topic = traffic, partition = 0, offset
6294, CreateTime = 1550079184958, checksum = 1812888233, serialized key size = -1, serialized value size = 16, key = null, value = 1550079184958,74))
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at com.cloudera.labs.envelope.run.Runner.awaitAllOffMainThreadsFinished(Runner.java:380)
at com.cloudera.labs.envelope.run.Runner.runBatch(Runner.java:347)
at com.cloudera.labs.envelope.run.Runner.access$000(Runner.java:70)
at com.cloudera.labs.envelope.run.Runner$1.call(Runner.java:226)
at com.cloudera.labs.envelope.run.Runner$1.call(Runner.java:208)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:272)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:272)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 0.0 (TID 0) had a not serializable result: org.apache.kaf
.clients.consumer.ConsumerRecord
Serialization stack:
- object not serializable (class: org.apache.kafka.clients.consumer.ConsumerRecord, value: ConsumerRecord(topic = traffic, partition = 0, offset
6294, CreateTime = 1550079184958, checksum = 1812888233, serialized key size = -1, serialized value size = 16, key = null, value = 1550079184958,74))
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:926)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:924)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:924)
at org.apache.spark.api.java.JavaRDDLike$class.foreachPartition(JavaRDDLike.scala:219)
at org.apache.spark.api.java.AbstractJavaRDDLike.foreachPartition(JavaRDDLike.scala:45)
at com.cloudera.labs.envelope.kafka.KafkaOutput.applyBulkMutations(KafkaOutput.java:70)
at com.cloudera.labs.envelope.run.DataStep.writeOutput(DataStep.java:419)
at com.cloudera.labs.envelope.run.DataStep.setData(DataStep.java:129)
at com.cloudera.labs.envelope.run.BatchStep.submit(BatchStep.java:83)
at com.cloudera.labs.envelope.run.Runner$2.call(Runner.java:372)
at com.cloudera.labs.envelope.run.Runner$2.call(Runner.java:369)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
19/02/13 11:33:16 ERROR Executor: Exception in task 1.0 in stage 2.0 (TID 2)
java.io.NotSerializableException: org.apache.kafka.clients.consumer.ConsumerRecord
Any thoughts
Thanks
Created 02-12-2019 12:07 PM
@shrini wrote:hi
The command is
C:\source\envelope\examples\target>spark-submit2 envelope-*.jar traffic.conf
These are the two jars i have in the folder C:\source\envelope\examples\target
envelope-0.6.0-shaded.jar
envelope-examples-0.6.0-shaded.jar
Thanks
And traffic.conf is available in the folder C:\source\envelope\examples\target