Created on 08-14-2013 02:39 AM - edited 09-16-2022 01:46 AM
Hi,
I configured a flume solr sink and related morphline config in CM based on this instructions.
Chapter "Configuring Flume Morphline Solr Sink for use with the Solr Service" tells me that within flume.conf I can just use the plain filename morphline.conf to reference the morphline configuration (if it is in the same path, of course). Therefore I configured in CM Flume-instance config (navigation flume-instance=>Configuration=>agent=>Configuration File):
...
agent.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
agent.sinks.solrSink.channel = memoryChannel
#agent.sinks.solrSink.batchSize = 1000
#agent.sinks.solrSink.batchDurationMillis = 1000
agent.sinks.solrSink.morphlineFile = morphline.conf
#agent.sinks.solrSink.morphlineId = morphline1
....
and in flume-instance=>Configuration=>agent=>Flume-NG solr sink the required settings, that goes into morphline.conf.
But after deploying the config and restarting the Flume service on that node, I get the error that the file "morphline.conf" cannot be found:
...
2013-08-14 11:20:59,827 INFO org.apache.flume.sink.solr.morphline.TwitterSource: Processed 400 docs
2013-08-14 11:21:00,939 INFO org.apache.flume.sink.solr.morphline.MorphlineSink: Starting Morphline Sink solrSink (MorphlineSolrSink) ...
2013-08-14 11:21:00,939 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Component type: SINK, name: solrSink started
2013-08-14 11:21:00,940 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@556917ee counterGroup:{ name:null counters:{} } } - Exception follows.
com.cloudera.cdk.morphline.api.MorphlineCompilationException: Cannot parse morphline file: morphline.conf
at com.cloudera.cdk.morphline.base.Compiler.compile(Compiler.java:51)
at org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.configure(MorphlineHandlerImpl.java:90)
at org.apache.flume.sink.solr.morphline.MorphlineSink.start(MorphlineSink.java:97)
at org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:46)
at org.apache.flume.SinkRunner.start(SinkRunner.java:79)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException: File not found: morphline.conf
at com.cloudera.cdk.morphline.base.Compiler.parse(Compiler.java:64)
Any hints how to reference morphline.conf from flume.conf ?
many thanks in advance...Gerd...
Created 08-14-2013 08:03 AM
Hello,
You have two options here:
- As dvohra mentioned, you could copy the morphline.conf file to /etc/flume-ng/conf and then give a full path in your agent config in CM. Then restart.
- If you are wanting to use the "Flume-NG Solr Sink" config section in CM to config your morphlines, then you need to change "morphline.conf" to "morphlines.com", notice the "s" at the end. Here is my agent config as an example:
avro.sources=src
avro.sinks=solrSink
avro.channels=memoryChannel
avro.sources.src.type=avro
avro.sources.src.bind=cdh43-1.test.com
avro.sources.src.port=8889
avro.sinks.solrSink.type=org.apache.flume.sink.solr.morphline.MorphlineSolrSink
avro.sinks.solrSink.channel=memoryChannel
avro.sinks.solrSink.morphlineFile=morphlines.conf
avro.channels.memoryChannel.type=memory
avro.channels.memoryChannel.capacity=4096
avro.channels.memoryChannel.transactionCapacity=100
avro.channels.memoryChannel.byteCapacity=0
avro.sources.src.channels=memoryChannel
You can also see the morphlines file that CM is trying to use by looking in "/var/run/cloudera-scm-agent/process/<id>-flume-AGENT" where <id> is the most recently created. You'll see a "morphlines.conf".
Hope this helps...
Thanks
Chris
Created 08-14-2013 07:13 AM
Caused by: java.io.FileNotFoundException: File not found: morphline.conf
Is /etc/flume-ng/conf/morphline.conf provided?
Some sample morphline.conf http://blog.cloudera.com/blog/2013/07/morphlines-the-easy-way-to-build-and-integrate-etl-apps-for-ap...
Created 08-14-2013 08:03 AM
Hello,
You have two options here:
- As dvohra mentioned, you could copy the morphline.conf file to /etc/flume-ng/conf and then give a full path in your agent config in CM. Then restart.
- If you are wanting to use the "Flume-NG Solr Sink" config section in CM to config your morphlines, then you need to change "morphline.conf" to "morphlines.com", notice the "s" at the end. Here is my agent config as an example:
avro.sources=src
avro.sinks=solrSink
avro.channels=memoryChannel
avro.sources.src.type=avro
avro.sources.src.bind=cdh43-1.test.com
avro.sources.src.port=8889
avro.sinks.solrSink.type=org.apache.flume.sink.solr.morphline.MorphlineSolrSink
avro.sinks.solrSink.channel=memoryChannel
avro.sinks.solrSink.morphlineFile=morphlines.conf
avro.channels.memoryChannel.type=memory
avro.channels.memoryChannel.capacity=4096
avro.channels.memoryChannel.transactionCapacity=100
avro.channels.memoryChannel.byteCapacity=0
avro.sources.src.channels=memoryChannel
You can also see the morphlines file that CM is trying to use by looking in "/var/run/cloudera-scm-agent/process/<id>-flume-AGENT" where <id> is the most recently created. You'll see a "morphlines.conf".
Hope this helps...
Thanks
Chris
Created 08-14-2013 11:33 PM
Many thanks Chris and dvohra,
I saw that the file /etc/flume-ng/conf/morphline.conf doesn't exist, but found the conf (after heavy searching 😉 ) under the /var/run/cloudera-scm-agent dir. Initially I thought that CM would provide the node with the config I defined in the CM-solr-sink section, but missed the different naming (**bleep** 's'). After creating /etc/flume-ng/conf/morphline.conf manually everything worked fine, but yes, I want to use CM to define and deploy the configuration, therefore I'll check out the morphlineS.conf way.
br...Gerd...
Created 07-02-2014 08:18 AM
How do you put the data in the table ?
When I use org.apache.flume.sink.solr.morphline.TwitterSource as type, I get some strange data, starting with Objj Avro....
So the data is not in JSON format but in Avro I guess... So how to create an external table in hive that can parse the avro format ?
Created 07-03-2014 12:16 AM
Hi,
to read data in avro format from Hive you have to use an Avro SerDe. Maybe a good starting point will be http://www.michael-noll.com/blog/2013/07/04/using-avro-in-mapreduce-jobs-with-hadoop-pig-hive/
But this is not related to this topic since the solr sink will put data into Solr. I'd suggest to use just a HDFS sink to put your data on HDFS and create an (external or not) Hive table afterwards. You do not need Solr and/or Morphlines for this.
best, Gerd