About Aaron_Dossett

Aaron_Dossett · ‎03-11-2016

Kafka 0.9 requires that the "key.serializer" and "value.serializer" items in ProducerConfig be java classes, not a string containing the name of a java class. See https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java#L280-L287 However, if try to do that like this: Properties props = new Properties();props.put("bootstrap.servers", topoProperties.getProperty("bootstrap.servers")); props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, io.confluent.kafka.serializers.KafkaAvroSerializer.class); props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, io.confluent.kafka.serializers.KafkaAvroSerializer.class); conf.put(KafkaBolt.KAFKA_BROKER_PROPERTIES, props); Storm fails to start with this error: java.lang.IllegalArgumentException: Topology conf is not json-serializable See: https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/StormSubmitter.java#L192-L194 Seems like kafka-bolt's prepare method will have to transform a string into a Java class. Before I go down that path, I was wondering if anyone else had run into this problem and if there's a workaround? Thanks! -Aaron

Aaron_Dossett · ‎02-22-2016

Interesting, it looks like I'm seeing a similar error in a different context (my version of hive doesn't have any of the LLAP functionality, as i understand it).

Aaron_Dossett · ‎02-22-2016

Thanks @Divakar Annapureddy! I checked and that value is currently 131072 in core-site. I tried overriding it in Hive with "set io.file.buffer.size=146215" and got the same error message. In other words, it still has a buffer size of 32K and not the value in core-site or what I set through hive.

Aaron_Dossett · ‎02-22-2016

Running this statement INSERT INTO TABLE FOO PARTITION(partition_date) SELECT DISTINCT [columns from BAR] FROM BAR left outer join FOO ON (BAR.application.id = FOO.unique_id) where FOO.unique_id is null fails with the stack trace below. The only setting I could find that seemed relevant was hive.exec.orc.default.buffer.size, but I confirmed that is already set to the default value of 262,144. FOO has about 3.8B rows and is an ORC table, BAR is an external avro table. I'm running on HDP 2.3.4 with Hive 1.2.1 Anyone have suggestions for addressing this? Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 32768 needed = 146215 at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:193) at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:238) at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StringDirectTreeReader.next(TreeReaderFactory.java:1554) at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StringTreeReader.next(TreeReaderFactory.java:1397) at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.next(TreeReaderFactory.java:2004) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1039) at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.next(OrcRawRecordMerger.java:249) at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.<init>(OrcRawRecordMerger.java:186) at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.<init>(OrcRawRecordMerger.java:226) at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:437) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1269) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1151) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)

Aaron_Dossett · ‎02-18-2016

A previous question (https://community.hortonworks.com/questions/5482/how-to-fetch-service-level-configuration-using-res.html) detailed how to navigate to particular configurations and extract certain values. Is there a way I can use the REST API to get the configuration that's currently in effect? For example, this would show me all of the possible hive-site configurations /api/v1/clusters/cluster/configurations?type=hive-site How can I programattically determine which one is currently active?

Aaron_Dossett · ‎01-27-2016

I clearly have a LOT more to learn, but adding this fixes the issue: conf.setSkipMissingKryoRegistrations(false);

Aaron_Dossett · ‎01-27-2016

Great tip, @Jungtaek Lim! I might be missing something obvious, but my first attempt at registering a serializer seemed to have no effect at all. conf.registerSerialization(GenericData.Record.class, AvroGenericSerializer.class); I got precisely the same error message. It's possible that my serializer has a bug, but I didn't see any error related to the serializer itself.

Aaron_Dossett · ‎01-26-2016

Thanks @tgoetz, that makes sense. I have seen other deserialization errors in single worker topologies, which made this a bit surprising.

Aaron_Dossett · ‎01-26-2016

I have a Storm topology with bolts that pass Avro GenericRecord objects, i.e. the new AvroGenericRecordBolt (https://github.com/apache/storm/blob/master/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/bolt/AvroGenericRecordBolt.java) and a custom bolt which emits GenericRecords. When I run the topology in a single worker, everything is fine. When I run with multiple workers, I get the serialization errors below. I tried registering GenericData$Record with kryo, but since Record doesn't implement Serializable that doesn't work either (as expected). - Why does this error appear only when I have multiple workers? - Any suggestions to get around this given that Record isn't Serializable? java.lang.RuntimeException: java.lang.RuntimeException: java.io.NotSerializableException: org.apache.avro.generic.GenericData$Record at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1] at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1] at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1] at backtype.storm.disruptor$consume_loop_STAR_$fn__1077.invoke(disruptor.clj:94) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1] at backtype.storm.util$async_loop$fn__551.invoke(util.clj:465) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1] at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] Caused by: java.lang.RuntimeException: java.io.NotSerializableException: org.apache.avro.generic.GenericData$Record at backtype.storm.serialization.SerializableSerializer.write(SerializableSerializer.java:41) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1] at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) ~[kryo-2.21.jar:na] at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:75) ~[kryo-2.21.jar:na] at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:18) ~[kryo-2.21.jar:na] at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:486) ~[kryo-2.21.jar:na] at backtype.storm.serialization.KryoValuesSerializer.serializeInto(KryoValuesSerializer.java:44) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1] at backtype.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:44) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1] at backtype.storm.daemon.worker$mk_transfer_fn$transfer_fn__5386.invoke(worker.clj:139) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1] at backtype.storm.daemon.executor$start_batch_transfer__GT_worker_handler_BANG_$fn__5107.invoke(executor.clj:263) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1] at backtype.storm.disruptor$clojure_handler$reify__1064.onEvent(disruptor.clj:58) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1] at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1] ... 6 common frames omitted Caused by: java.io.NotSerializableException: org.apache.avro.generic.GenericData$Record at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) ~[na:1.7.0_45] at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) ~[na:1.7.0_45] at backtype.storm.serialization.SerializableSerializer.write(SerializableSerializer.java:38) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1] ... 16 common frames omitted

Aaron_Dossett · ‎01-18-2016

You were exactly right. A particular change I made was backward compatible, but not forward compatible. When tested the right way, Hive performed exactly as expected.

Online	Offline
Last Visited	‎05-14-2018 09:23 PM

Member Since	‎11-30-2015 11:16 PM
Last Visited	‎05-14-2018 09:23 PM
Posts	39
Kudos received	23

Cloudera Community

Re: How can I specify a Java class in the kafka bo...

Re: Can I ensure that my own jars have classpath p...

Re: How can I pass "-D" arguments to Oozie java ac...

How can I specify a Java class in the kafka bolt P...

Re: How can I set an ORC InStream buffer size in H...

Re: How can I set an ORC InStream buffer size in H...

How can I set an ORC InStream buffer size in Hive?

How can I use the Ambari REST API to get active co...

Re: Why does Storm fail with this deserialization ...

Re: Why does Storm fail with this deserialization ...

Re: Why does Storm fail with this deserialization ...

Why does Storm fail with this deserialization erro...

Re: Can Hive avro tables support changing schemas?