Posts: 17
Registered: ‎08-14-2013

Re: Flume morphline sink to HDFS

Morphlines seem to be having a problem reading my Avro. I'm using the readAvro command but I get an indexoutofbounds somewhere? The avro is nested somtimes to 9 levels.



Caused by: java.lang.ArrayIndexOutOfBoundsException
2013-10-07 11:00:22,472 ERROR org.apache.flume.sink.solr.morphline.MorphlineSink: Morphline Sink solrSink: Unable to process event from channel file2. Exception follows.
com.cloudera.cdk.morphline.api.MorphlineRuntimeException: java.lang.ArrayIndexOutOfBoundsException
at com.cloudera.cdk.morphline.base.FaultTolerance.handleException(
at org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.process(
at org.apache.flume.sink.solr.morphline.MorphlineSink.process(
at org.apache.flume.sink.DefaultSinkProcessor.process(
at org.apache.flume.SinkRunner$
Caused by: java.lang.ArrayIndexOutOfBoundsException


commands : [
# Parse Avro container file and emit a record for each avro object
readAvro {
# Path to schema:
# readerSchemaFile : /etc/flume-ng/Viper.avsc
writerSchemaFile : /etc/flume-ng/Viper.avsc




Cloudera Employee
Posts: 146
Registered: ‎08-21-2013

Re: Flume morphline sink to HDFS

The stacktrace you provided doesn't include the root cause exception. Please include that as well for better diagnostics.

Make sure the avro schema supplied to readAvro matches your actual avro event. How exactly was the avro event written by the ingesting app?

Also note: For the readAvro command to work correctly, each Avro event must have been written with the same writer schema by the ingesting app. That is, you cannot parse two Avro events with two different writer schemas A and B within the same readAvro command. The readAvroContainer command doesn't have that limitation, of course, because the writer schema comes embedded inside each Avro container, per the standard Avro container specification.