Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Who agreed with this topic

Morphline: IOException Not a data file

New Contributor

Hi,

 

I'm having some problems passing an avro_event through to Morphlines.

 

When I skip the SolrSink in my flume config and just write to file (file-roll-sink) using an avro_event serializer I get a file the complete event in it.

 

java -jar ~/avro-tools-1.7.4.jar tojson ../flume/1386248426733-1
{"headers":{"timestamp":"1386248331991","id":"e96dc77f-3b07-4b5d-9e2e-7b641936c0f1","hostname":"192.168.0.107","log_type":"com_job"},"body":"[2013-11-04 05:51:34,155][Thread-27][ERROR][..."}

 

When I enable the SolrSink with the most basic morphline configuration:

 

morphlines : [
  {
    id : morphline1
    importCommands : ["com.cloudera.**", "org.apache.solr.**"]
    
    commands : [                    
      { 
        readAvroContainer {
          
        }
      } 
      { logDebug { format : "output record: {}", args : ["@{}"] } }    
    ]
  }
]

 I get the following stacktrace: 

 

TRACE com.cloudera.cdk.morphline.avro.ReadAvroContainerBuilder$ReadAvroContainer: beforeProcess: {_attachment_body=[[B@4ea20232], hostname=[192.168.0.107], id=[77ae7588-b64a-41af-98e6-006730a28734], log_type=[com_job], timestamp=[1386248421968]}
2013-12-05 05:50:21,176 ERROR org.apache.flume.sink.solr.morphline.MorphlineSink: Morphline Sink SolrOut: Unable to process event from channel mc1. Exception follows.
com.cloudera.cdk.morphline.api.MorphlineRuntimeException: com.cloudera.cdk.morphline.api.MorphlineRuntimeException: java.io.IOException: Not a data file.
	at com.cloudera.cdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:76)
	at org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.process(MorphlineHandlerImpl.java:110)
	at org.apache.flume.sink.solr.morphline.MorphlineSink.process(MorphlineSink.java:140)
	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
	at java.lang.Thread.run(Thread.java:662)
Caused by: com.cloudera.cdk.morphline.api.MorphlineRuntimeException: java.io.IOException: Not a data file.
	at com.cloudera.cdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:82)
	at com.cloudera.cdk.morphline.base.AbstractCommand.process(AbstractCommand.java:113)
	at com.cloudera.cdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:125)
	at com.cloudera.cdk.morphline.base.AbstractCommand.process(AbstractCommand.java:113)
	at org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.process(MorphlineHandlerImpl.java:106)
	... 4 more
Caused by: java.io.IOException: Not a data file.
	at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
	at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
	at com.cloudera.cdk.morphline.avro.ReadAvroContainerBuilder$ReadAvroContainer.doProcess(ReadAvroContainerBuilder.java:118)
	at com.cloudera.cdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:80)
	... 8 more

 Can somebody explain where this is coming from?

 

Thank you!

 

Kristof.

 

Who agreed with this topic