Member since
11-30-2015
39
Posts
23
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1095 | 03-11-2016 07:31 PM | |
6092 | 12-17-2015 12:33 AM | |
1309 | 12-16-2015 10:46 PM |
03-11-2016
07:15 PM
2 Kudos
Kafka 0.9 requires that the "key.serializer" and "value.serializer" items in ProducerConfig be java classes, not a string containing the name of a java class. See https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java#L280-L287 However, if try to do that like this: Properties props = new Properties();props.put("bootstrap.servers", topoProperties.getProperty("bootstrap.servers"));
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, io.confluent.kafka.serializers.KafkaAvroSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, io.confluent.kafka.serializers.KafkaAvroSerializer.class);
conf.put(KafkaBolt.KAFKA_BROKER_PROPERTIES, props); Storm fails to start with this error: java.lang.IllegalArgumentException: Topology conf is not json-serializable See: https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/StormSubmitter.java#L192-L194 Seems like kafka-bolt's prepare method will have to transform a string into a Java class. Before I go down that path, I was wondering if anyone else had run into this problem and if there's a workaround? Thanks! -Aaron
... View more
Labels:
- Labels:
-
Apache Storm
02-22-2016
03:21 PM
1 Kudo
Interesting, it looks like I'm seeing a similar error in a different context (my version of hive doesn't have any of the LLAP functionality, as i understand it).
... View more
02-22-2016
01:45 PM
Thanks @Divakar Annapureddy! I checked and that value is currently 131072 in core-site. I tried overriding it in Hive with "set io.file.buffer.size=146215" and got the same error message. In other words, it still has a buffer size of 32K and not the value in core-site or what I set through hive.
... View more
02-22-2016
01:14 PM
2 Kudos
Running this statement INSERT INTO TABLE FOO PARTITION(partition_date) SELECT DISTINCT [columns from BAR] FROM BAR left outer join FOO ON (BAR.application.id = FOO.unique_id) where FOO.unique_id is null fails with the stack trace below. The only setting I could find that seemed relevant was hive.exec.orc.default.buffer.size, but I confirmed that is already set to the default value of 262,144. FOO has about 3.8B rows and is an ORC table, BAR is an external avro table. I'm running on HDP 2.3.4 with Hive 1.2.1 Anyone have suggestions for addressing this? Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 32768 needed = 146215
at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:193)
at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:238)
at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StringDirectTreeReader.next(TreeReaderFactory.java:1554)
at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StringTreeReader.next(TreeReaderFactory.java:1397)
at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.next(TreeReaderFactory.java:2004)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1039)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.next(OrcRawRecordMerger.java:249)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.<init>(OrcRawRecordMerger.java:186)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.<init>(OrcRawRecordMerger.java:226)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:437)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1269)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1151)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)
... View more
Labels:
- Labels:
-
Apache Hive
02-18-2016
10:40 PM
2 Kudos
A previous question (https://community.hortonworks.com/questions/5482/how-to-fetch-service-level-configuration-using-res.html) detailed how to navigate to particular configurations and extract certain values. Is there a way I can use the REST API to get the configuration that's currently in effect? For example, this would show me all of the possible hive-site configurations /api/v1/clusters/cluster/configurations?type=hive-site How can I programattically determine which one is currently active?
... View more
Labels:
- Labels:
-
Apache Ambari
01-27-2016
02:38 AM
I clearly have a LOT more to learn, but adding this fixes the issue: conf.setSkipMissingKryoRegistrations(false);
... View more
01-27-2016
02:13 AM
Great tip, @Jungtaek Lim! I might be missing something obvious, but my first attempt at registering a serializer seemed to have no effect at all. conf.registerSerialization(GenericData.Record.class, AvroGenericSerializer.class);
I got precisely the same error message. It's possible that my serializer has a bug, but I didn't see any error related to the serializer itself.
... View more
01-26-2016
09:31 PM
Thanks @tgoetz, that makes sense. I have seen other deserialization errors in single worker topologies, which made this a bit surprising.
... View more
01-26-2016
09:13 PM
I have a Storm topology with bolts that pass Avro GenericRecord objects, i.e. the new AvroGenericRecordBolt (https://github.com/apache/storm/blob/master/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/bolt/AvroGenericRecordBolt.java) and a custom bolt which emits GenericRecords. When I run the topology in a single worker, everything is fine. When I run with multiple workers, I get the serialization errors below. I tried registering GenericData$Record with kryo, but since Record doesn't implement Serializable that doesn't work either (as expected). - Why does this error appear only when I have multiple workers? - Any suggestions to get around this given that Record isn't Serializable? java.lang.RuntimeException: java.lang.RuntimeException: java.io.NotSerializableException: org.apache.avro.generic.GenericData$Record
at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1]
at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1]
at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1]
at backtype.storm.disruptor$consume_loop_STAR_$fn__1077.invoke(disruptor.clj:94) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1]
at backtype.storm.util$async_loop$fn__551.invoke(util.clj:465) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
Caused by: java.lang.RuntimeException: java.io.NotSerializableException: org.apache.avro.generic.GenericData$Record
at backtype.storm.serialization.SerializableSerializer.write(SerializableSerializer.java:41) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1]
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) ~[kryo-2.21.jar:na]
at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:75) ~[kryo-2.21.jar:na]
at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:18) ~[kryo-2.21.jar:na]
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:486) ~[kryo-2.21.jar:na]
at backtype.storm.serialization.KryoValuesSerializer.serializeInto(KryoValuesSerializer.java:44) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1]
at backtype.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:44) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1]
at backtype.storm.daemon.worker$mk_transfer_fn$transfer_fn__5386.invoke(worker.clj:139) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1]
at backtype.storm.daemon.executor$start_batch_transfer__GT_worker_handler_BANG_$fn__5107.invoke(executor.clj:263) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1]
at backtype.storm.disruptor$clojure_handler$reify__1064.onEvent(disruptor.clj:58) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1]
at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1]
... 6 common frames omitted
Caused by: java.io.NotSerializableException: org.apache.avro.generic.GenericData$Record
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) ~[na:1.7.0_45]
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) ~[na:1.7.0_45]
at backtype.storm.serialization.SerializableSerializer.write(SerializableSerializer.java:38) ~[storm-core-0.9.3.2.2.4.12-1.jar:0.9.3.2.2.4.12-1]
... 16 common frames omitted
... View more
Labels:
- Labels:
-
Apache Storm
01-18-2016
02:13 PM
You were exactly right. A particular change I made was backward compatible, but not forward compatible. When tested the right way, Hive performed exactly as expected.
... View more