Support Questions
Find answers, ask questions, and share your expertise

Flume morphline sink to HDFS

Re: Flume morphline sink to HDFS


Morphlines seem to be having a problem reading my Avro. I'm using the readAvro command but I get an indexoutofbounds somewhere? The avro is nested somtimes to 9 levels.



Caused by: java.lang.ArrayIndexOutOfBoundsException
2013-10-07 11:00:22,472 ERROR org.apache.flume.sink.solr.morphline.MorphlineSink: Morphline Sink solrSink: Unable to process event from channel file2. Exception follows.
com.cloudera.cdk.morphline.api.MorphlineRuntimeException: java.lang.ArrayIndexOutOfBoundsException
at com.cloudera.cdk.morphline.base.FaultTolerance.handleException(
at org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.process(
at org.apache.flume.sink.solr.morphline.MorphlineSink.process(
at org.apache.flume.sink.DefaultSinkProcessor.process(
at org.apache.flume.SinkRunner$
Caused by: java.lang.ArrayIndexOutOfBoundsException


commands : [
# Parse Avro container file and emit a record for each avro object
readAvro {
# Path to schema:
# readerSchemaFile : /etc/flume-ng/Viper.avsc
writerSchemaFile : /etc/flume-ng/Viper.avsc





Re: Flume morphline sink to HDFS

Expert Contributor
The stacktrace you provided doesn't include the root cause exception. Please include that as well for better diagnostics.

Make sure the avro schema supplied to readAvro matches your actual avro event. How exactly was the avro event written by the ingesting app?

Also note: For the readAvro command to work correctly, each Avro event must have been written with the same writer schema by the ingesting app. That is, you cannot parse two Avro events with two different writer schemas A and B within the same readAvro command. The readAvroContainer command doesn't have that limitation, of course, because the writer schema comes embedded inside each Avro container, per the standard Avro container specification.