Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

starting flume solr sink: morphlines.conf not found

avatar
Guru

Hi,

 

I configured a flume solr sink and related morphline config in CM based on this instructions.

Chapter "Configuring Flume Morphline Solr Sink for use with the Solr Service" tells me that within flume.conf I can just use the plain filename morphline.conf to reference the morphline configuration (if it is in the same path, of course). Therefore I configured in CM Flume-instance config (navigation flume-instance=>Configuration=>agent=>Configuration File):

...

agent.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
agent.sinks.solrSink.channel = memoryChannel
#agent.sinks.solrSink.batchSize = 1000
#agent.sinks.solrSink.batchDurationMillis = 1000
agent.sinks.solrSink.morphlineFile = morphline.conf
#agent.sinks.solrSink.morphlineId = morphline1

....

 

and in flume-instance=>Configuration=>agent=>Flume-NG solr sink the required settings, that goes into morphline.conf.

 

But after deploying the config and restarting the Flume service on that node, I get the error that the file "morphline.conf" cannot be found:

...

 

2013-08-14 11:20:59,827 INFO org.apache.flume.sink.solr.morphline.TwitterSource: Processed 400 docs
2013-08-14 11:21:00,939 INFO org.apache.flume.sink.solr.morphline.MorphlineSink: Starting Morphline Sink solrSink (MorphlineSolrSink) ...
2013-08-14 11:21:00,939 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Component type: SINK, name: solrSink started
2013-08-14 11:21:00,940 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@556917ee counterGroup:{ name:null counters:{} } } - Exception follows.
com.cloudera.cdk.morphline.api.MorphlineCompilationException: Cannot parse morphline file: morphline.conf
at com.cloudera.cdk.morphline.base.Compiler.compile(Compiler.java:51)
at org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.configure(MorphlineHandlerImpl.java:90)
at org.apache.flume.sink.solr.morphline.MorphlineSink.start(MorphlineSink.java:97)
at org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:46)
at org.apache.flume.SinkRunner.start(SinkRunner.java:79)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException: File not found: morphline.conf
at com.cloudera.cdk.morphline.base.Compiler.parse(Compiler.java:64)

 

 

Any hints how to reference morphline.conf from flume.conf ?

 

many thanks in advance...Gerd...

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hello,

 

You have two options here:

 

- As dvohra mentioned, you could copy the morphline.conf file to /etc/flume-ng/conf and then give a full path in your agent config in CM.  Then restart.

- If you are wanting to use the "Flume-NG Solr Sink" config section in CM to config your morphlines, then you need to change "morphline.conf" to "morphlines.com", notice the "s" at the end.  Here is my agent config as an example:

 

avro.sources=src
avro.sinks=solrSink
avro.channels=memoryChannel
avro.sources.src.type=avro
avro.sources.src.bind=cdh43-1.test.com
avro.sources.src.port=8889
avro.sinks.solrSink.type=org.apache.flume.sink.solr.morphline.MorphlineSolrSink
avro.sinks.solrSink.channel=memoryChannel
avro.sinks.solrSink.morphlineFile=morphlines.conf
avro.channels.memoryChannel.type=memory
avro.channels.memoryChannel.capacity=4096
avro.channels.memoryChannel.transactionCapacity=100
avro.channels.memoryChannel.byteCapacity=0
avro.sources.src.channels=memoryChannel

 

You can also see the morphlines file that CM is trying to use by looking in "/var/run/cloudera-scm-agent/process/<id>-flume-AGENT" where <id> is the most recently created.  You'll see a "morphlines.conf".

 

Hope this helps...

 

Thanks

Chris

 

View solution in original post

5 REPLIES 5

avatar
Expert Contributor

Caused by: java.io.FileNotFoundException: File not found: morphline.conf

 


Is  /etc/flume-ng/conf/morphline.conf provided?

 

Some sample morphline.conf http://blog.cloudera.com/blog/2013/07/morphlines-the-easy-way-to-build-and-integrate-etl-apps-for-ap...

avatar
Super Collaborator

Hello,

 

You have two options here:

 

- As dvohra mentioned, you could copy the morphline.conf file to /etc/flume-ng/conf and then give a full path in your agent config in CM.  Then restart.

- If you are wanting to use the "Flume-NG Solr Sink" config section in CM to config your morphlines, then you need to change "morphline.conf" to "morphlines.com", notice the "s" at the end.  Here is my agent config as an example:

 

avro.sources=src
avro.sinks=solrSink
avro.channels=memoryChannel
avro.sources.src.type=avro
avro.sources.src.bind=cdh43-1.test.com
avro.sources.src.port=8889
avro.sinks.solrSink.type=org.apache.flume.sink.solr.morphline.MorphlineSolrSink
avro.sinks.solrSink.channel=memoryChannel
avro.sinks.solrSink.morphlineFile=morphlines.conf
avro.channels.memoryChannel.type=memory
avro.channels.memoryChannel.capacity=4096
avro.channels.memoryChannel.transactionCapacity=100
avro.channels.memoryChannel.byteCapacity=0
avro.sources.src.channels=memoryChannel

 

You can also see the morphlines file that CM is trying to use by looking in "/var/run/cloudera-scm-agent/process/<id>-flume-AGENT" where <id> is the most recently created.  You'll see a "morphlines.conf".

 

Hope this helps...

 

Thanks

Chris

 

avatar
Guru

Many thanks Chris and dvohra,

 

I saw that the file /etc/flume-ng/conf/morphline.conf doesn't exist, but found the conf (after heavy searching 😉 ) under the /var/run/cloudera-scm-agent dir. Initially I thought that CM would provide the node with the config I defined in the CM-solr-sink section, but missed the different naming (**bleep** 's'). After creating /etc/flume-ng/conf/morphline.conf manually everything worked fine, but yes, I want to use CM to define and deploy the configuration, therefore I'll check out the morphlineS.conf way.

 

br...Gerd...

avatar
Contributor

How do you put the data in the table ?

 

When I use org.apache.flume.sink.solr.morphline.TwitterSource as type, I get some strange data, starting with Objj Avro....

 

So the data is not in JSON format but in Avro I guess... So how to create an external table in hive that can parse the avro format ?

--
Lefevre Kevin

avatar
Guru

Hi,

 

to read data in avro format from Hive you have to use an Avro SerDe. Maybe a good starting point will be http://www.michael-noll.com/blog/2013/07/04/using-avro-in-mapreduce-jobs-with-hadoop-pig-hive/

 

But this is not related to this topic since the solr sink will put data into Solr. I'd suggest to use just a HDFS sink to put your data on HDFS and create an (external or not) Hive table afterwards. You do not need Solr and/or Morphlines for this.

 

best, Gerd