Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Flume avro event serializer not producing avro file.

Highlighted

Flume avro event serializer not producing avro file.

Explorer

Hi Guys,

I've built the avro serializer

https://github.com/cloudera/cdk/blob/master/cdk-flume-avro-event-serializer/src/main/java/org/apache...

and installed it in the plugins directory of flume. I've update my agent config with the serializer pointing to AvroEventSerializer$Builder.

When I send in my events I'm setting the schema in the header (literal string for now), the body is json. It goes through to hdfs but the body is just plain text with no errors. I was expecting an avro file?

Am I doing anything wrong?

Do you have an example agent config?

 

Thanks

 

Andrew

8 REPLIES 8

Re: Flume avro event serializer not producing avro file.

Rising Star

Does setting sink serializer to avro_event generate Json?

 

agent.sinks.svc_0_sink.type = hdfs
agent.sinks.svc_0_sink.hdfs.fileType = DataStream
agent.sinks.svc_0_sink.serializer = avro_event
agent.sinks.svc_0_sink.serializer.compressionCodec = snappy

Re: Flume avro event serializer not producing avro file.

Explorer

I've moved cdk-flume-avro-event-serializer-0.5.1-SNAPSHOT.jar into the lib directory of flume-ng. Now I get errors

 

org.apache.flume.FlumeException: Unable to instantiate Builder from org.apache.flume.serialization.AvroEventSerializer: does not appear to implement org.apache.flume.serialization.EventSerializer$Builder

 

My agent config is:

 

## Write to HDFS
collector.sinks.HadoopOut.type = hdfs
collector.sinks.HadoopOut.hdfs.path = /flume/
collector.sinks.HadoopOut.hdfs.fileType = DataStream
collector.sinks.HadoopOut.hdfs.rollSize = 0
collector.sinks.HadoopOut.hdfs.rollCount = 5000
collector.sinks.HadoopOut.hdfs.rollInterval = 5
collector.sinks.HadoopOut.hdfs.batchSize = 1000
collector.sinks.HadoopOut.serializer = org.apache.flume.serialization.AvroEventSerializer
collector.sinks.HadoopOut.serializer.compressionCodec = snappy

Re: Flume avro event serializer not producing avro file.

Explorer

I've fixed it. I forgot the $Bulider in the agent config.

 

Works like a charm.

 

Thanks

 

Andrew

Re: Flume avro event serializer not producing avro file.

Rising Star
Correction:
Does setting sink serializer to avro_event generate Avro?

Re: Flume avro event serializer not producing avro file.

Explorer
Yes but i'm using this class so my client apps can fire in custom schemas in the headers and have flume serialize the json in the body.

if i set avro_event the schema is the default header and body of the flume event.

The class now creates an avro via flume but when when i use avro tools (tojson) or hive to look at it i now get an indexoutofbounds error?

i assume when using this clsss the body of the flume event should be json?

Re: Flume avro event serializer not producing avro file.

Explorer

Can anyone help with this? I think I'm I just not setting the body of the flume event correctly?

 

Getting this to work would mean a wider adoption of Hadoop in my company.

Re: Flume avro event serializer not producing avro file.

Explorer

I spoke too soon!

 

The answer was in the Test class and the comments. The body is the avro datum binary.

 

Event event = EventBuilder.withBody(serializeAvro(record, schema));

 

private byte[] serializeAvro(Object datum, Schema schema) throws IOException {

    ByteArrayOutputStream out = new ByteArrayOutputStream();

    ReflectDatumWriter<Object> writer = new ReflectDatumWriter<Object>(schema);

    BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(out, null);

    out.reset();

    writer.write(datum, encoder);

    encoder.flush();

    return out.toByteArray();

  }

Re: Flume avro event serializer not producing avro file.

New Contributor

Dr. I might be facing same issue and unfortunately I dont see whats real issue given that when I try to read file using the validateAvroFile method it fails for me on console too. Could you pls point out the real issue and guide me in right direction.

 

Your help would mean a lot to me. Hoping for a response.

 

Thanks