Cloudera Labs
Provide feedback on Cloudera Labs
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Maven : Dowenloading Envelope binaries

Solved Go to solution

Re: Maven : Dowenloading Envelope binaries

New Contributor

@Jeremy Beard wrote:
Ok yes so the wildcard is expanding out to both jar filenames, and so the
second argument becomes the second jar and not "traffic.conf". Instead you
just specify the one jar filename.

Okay. Thanks

Makins some progress with you help. i am able to move to next issue 

I am running Kafka_2.11-0.10.1.1

 

C:\source\envelope\examples\target>spark-submit2 envelope-0.6.0-shaded.jar traffic.conf
Exception in thread "main" java.lang.RuntimeException: Kafka input offset management not currently compatible with stream windowing.
at com.cloudera.labs.envelope.kafka.KafkaInput.configure(KafkaInput.java:102)
at com.cloudera.labs.envelope.input.InputFactory.create(InputFactory.java:39)
at com.cloudera.labs.envelope.run.DataStep.getInput(DataStep.java:166)
at com.cloudera.labs.envelope.run.DataStep.getComponents(DataStep.java:357)
at com.cloudera.labs.envelope.run.StreamingStep.getComponents(StreamingStep.java:122)
at com.cloudera.labs.envelope.run.Runner.initializeSecurity(Runner.java:481)
at com.cloudera.labs.envelope.run.Runner.run(Runner.java:99)
at com.cloudera.labs.envelope.EnvelopeMain.main(EnvelopeMain.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Re: Maven : Dowenloading Envelope binaries

Rising Star
That might be a bug in the example configuration. Can you try adding
"offsets.manage = false" to the Kafka input (a good spot is below the line "
group.id = traffic") and run it again?

Re: Maven : Dowenloading Envelope binaries

New Contributor

hi 

 

Thanks again. Some more progress and new issue. 

 

C:\source\envelope\examples\target>spark-submit2 envelope-0.6.0-shaded.jar traffic.conf
Exception in thread "main" java.lang.NullPointerException: Supplied TokenProvider cannot be null
at envelope.shaded.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:229)
at com.cloudera.labs.envelope.security.TokenStoreManager.addTokenProvider(TokenStoreManager.java:80)
at com.cloudera.labs.envelope.run.Runner.initializeSecurity(Runner.java:491)
at com.cloudera.labs.envelope.run.Runner.run(Runner.java:99)
at com.cloudera.labs.envelope.EnvelopeMain.main(EnvelopeMain.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

 

 

Thanks

 

 

Re: Maven : Dowenloading Envelope binaries

Rising Star
That particular issue has been resolved in Envelope 0.6.1. You can get the
jar for that from the Releases tab of the Envelope repository on GitHub.

Re: Maven : Dowenloading Envelope binaries

New Contributor

hi Jeremy

 

Please see below the error i am getting when i run the traffic pipeline with new jar. 

 

C:\source\envelope-0.6.1\examples\traffic>spark-submit2 envelope-0.6.1.jar traffic.conf

 

 

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 0.0 (TID 0) had a not serializable result: org.apache.kaf
.clients.consumer.ConsumerRecord
Serialization stack:
- object not serializable (class: org.apache.kafka.clients.consumer.ConsumerRecord, value: ConsumerRecord(topic = traffic, partition = 0, offset
6294, CreateTime = 1550079184958, checksum = 1812888233, serialized key size = -1, serialized value size = 16, key = null, value = 1550079184958,74))
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:926)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:924)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:924)
at org.apache.spark.api.java.JavaRDDLike$class.foreachPartition(JavaRDDLike.scala:219)
at org.apache.spark.api.java.AbstractJavaRDDLike.foreachPartition(JavaRDDLike.scala:45)
at com.cloudera.labs.envelope.kafka.KafkaOutput.applyBulkMutations(KafkaOutput.java:70)
at com.cloudera.labs.envelope.run.DataStep.writeOutput(DataStep.java:419)
at com.cloudera.labs.envelope.run.DataStep.setData(DataStep.java:129)
at com.cloudera.labs.envelope.run.BatchStep.submit(BatchStep.java:83)
at com.cloudera.labs.envelope.run.Runner$2.call(Runner.java:372)
at com.cloudera.labs.envelope.run.Runner$2.call(Runner.java:369)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Exception in thread "main" java.util.concurrent.ExecutionException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage
0 (TID 0) had a not serializable result: org.apache.kafka.clients.consumer.ConsumerRecord
Serialization stack:
- object not serializable (class: org.apache.kafka.clients.consumer.ConsumerRecord, value: ConsumerRecord(topic = traffic, partition = 0, offset
6294, CreateTime = 1550079184958, checksum = 1812888233, serialized key size = -1, serialized value size = 16, key = null, value = 1550079184958,74))
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at com.cloudera.labs.envelope.run.Runner.awaitAllOffMainThreadsFinished(Runner.java:380)
at com.cloudera.labs.envelope.run.Runner.runBatch(Runner.java:347)
at com.cloudera.labs.envelope.run.Runner.access$000(Runner.java:70)
at com.cloudera.labs.envelope.run.Runner$1.call(Runner.java:226)
at com.cloudera.labs.envelope.run.Runner$1.call(Runner.java:208)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:272)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:272)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 0.0 (TID 0) had a not serializable result: org.apache.kaf
.clients.consumer.ConsumerRecord
Serialization stack:
- object not serializable (class: org.apache.kafka.clients.consumer.ConsumerRecord, value: ConsumerRecord(topic = traffic, partition = 0, offset
6294, CreateTime = 1550079184958, checksum = 1812888233, serialized key size = -1, serialized value size = 16, key = null, value = 1550079184958,74))
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:926)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:924)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:924)
at org.apache.spark.api.java.JavaRDDLike$class.foreachPartition(JavaRDDLike.scala:219)
at org.apache.spark.api.java.AbstractJavaRDDLike.foreachPartition(JavaRDDLike.scala:45)
at com.cloudera.labs.envelope.kafka.KafkaOutput.applyBulkMutations(KafkaOutput.java:70)
at com.cloudera.labs.envelope.run.DataStep.writeOutput(DataStep.java:419)
at com.cloudera.labs.envelope.run.DataStep.setData(DataStep.java:129)
at com.cloudera.labs.envelope.run.BatchStep.submit(BatchStep.java:83)
at com.cloudera.labs.envelope.run.Runner$2.call(Runner.java:372)
at com.cloudera.labs.envelope.run.Runner$2.call(Runner.java:369)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
19/02/13 11:33:16 ERROR Executor: Exception in task 1.0 in stage 2.0 (TID 2)
java.io.NotSerializableException: org.apache.kafka.clients.consumer.ConsumerRecord

 

Any thoughts

 

Thanks

Highlighted

Re: Maven : Dowenloading Envelope binaries

New Contributor

@shrini wrote:

hi

 

The command is

 

C:\source\envelope\examples\target>spark-submit2 envelope-*.jar traffic.conf

 

These are the two jars i have in the folder C:\source\envelope\examples\target

 

envelope-0.6.0-shaded.jar
envelope-examples-0.6.0-shaded.jar

 

Thanks

 

 


And traffic.conf is available in the folder C:\source\envelope\examples\target

 

Don't have an account?
Coming from Hortonworks? Activate your account here