Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Who Agreed with this topic

Morphline: IOException Not a data file

New Contributor

Hi,

 

I'm having some problems passing an avro_event through to Morphlines.

 

When I skip the SolrSink in my flume config and just write to file (file-roll-sink) using an avro_event serializer I get a file the complete event in it.

 

java -jar ~/avro-tools-1.7.4.jar tojson ../flume/1386248426733-1
{"headers":{"timestamp":"1386248331991","id":"e96dc77f-3b07-4b5d-9e2e-7b641936c0f1","hostname":"192.168.0.107","log_type":"com_job"},"body":"[2013-11-04 05:51:34,155][Thread-27][ERROR][..."}

 

When I enable the SolrSink with the most basic morphline configuration:

 

morphlines : [
  {
    id : morphline1
    importCommands : ["com.cloudera.**", "org.apache.solr.**"]
    
    commands : [                    
      { 
        readAvroContainer {
          
        }
      } 
      { logDebug { format : "output record: {}", args : ["@{}"] } }    
    ]
  }
]

 I get the following stacktrace: 

 

TRACE com.cloudera.cdk.morphline.avro.ReadAvroContainerBuilder$ReadAvroContainer: beforeProcess: {_attachment_body=[[B@4ea20232], hostname=[192.168.0.107], id=[77ae7588-b64a-41af-98e6-006730a28734], log_type=[com_job], timestamp=[1386248421968]}
2013-12-05 05:50:21,176 ERROR org.apache.flume.sink.solr.morphline.MorphlineSink: Morphline Sink SolrOut: Unable to process event from channel mc1. Exception follows.
com.cloudera.cdk.morphline.api.MorphlineRuntimeException: com.cloudera.cdk.morphline.api.MorphlineRuntimeException: java.io.IOException: Not a data file.
	at com.cloudera.cdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:76)
	at org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.process(MorphlineHandlerImpl.java:110)
	at org.apache.flume.sink.solr.morphline.MorphlineSink.process(MorphlineSink.java:140)
	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
	at java.lang.Thread.run(Thread.java:662)
Caused by: com.cloudera.cdk.morphline.api.MorphlineRuntimeException: java.io.IOException: Not a data file.
	at com.cloudera.cdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:82)
	at com.cloudera.cdk.morphline.base.AbstractCommand.process(AbstractCommand.java:113)
	at com.cloudera.cdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:125)
	at com.cloudera.cdk.morphline.base.AbstractCommand.process(AbstractCommand.java:113)
	at org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.process(MorphlineHandlerImpl.java:106)
	... 4 more
Caused by: java.io.IOException: Not a data file.
	at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
	at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
	at com.cloudera.cdk.morphline.avro.ReadAvroContainerBuilder$ReadAvroContainer.doProcess(ReadAvroContainerBuilder.java:118)
	at com.cloudera.cdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:80)
	... 8 more

 Can somebody explain where this is coming from?

 

Thank you!

 

Kristof.

 

Who Agreed with this topic