- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Flume custom source
- Labels:
-
Apache Flume
-
HDFS
Created on ‎12-11-2017 06:15 AM - edited ‎09-16-2022 05:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone, I am trying to use a custom source in Flume but when I start the agent I get a
org.apache.flume.FlumeException: Unable to load source type: com.cloudera.flume.source.MySource, class: com.cloudera.flume.source.MySource
The custom source is specified in the MySource.java file, the package is com.cloudera.flume.source.
This is what I did:
- Compile the java file passing the class path to flume and hadoop libraries (this generates the MySource.class file):
javac -cp /opt/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/hadoop/*:/opt/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/hadoop-mapreduce/*:/opt/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/flume-ng/lib/* MySource.java -Xlint
- Create manifest.mf file like the following:
Manifest-Version: 1.0 Main-Class: com.cloudera.flume.source.MySource
- Generate the MySource.jar file:
jar cvfm MySource.jar manifest.mf MySource.class
- Move the MySource.jar file in the flume library folder:
sudo mv MySource.jar /opt/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/flume-ng/lib
- The custom flume configuration file is the following:
# custom.conf # Naming the components on the current agent. MyAgent.sources = MySource MyAgent.channels = MemChannel MyAgent.sinks = HDFS # Describing/Configuring the source MyAgent.sources.MySource.type = com.cloudera.flume.source.MySource # Describing/Configuring the sink MyAgent.sinks.HDFS.type = hdfs MyAgent.sinks.HDFS.hdfs.path = /test/flume/mysource-logs MyAgent.sinks.HDFS.hdfs.fileType = DataStream MyAgent.sinks.HDFS.hdfs.writeFormat = Text MyAgent.sinks.HDFS.hdfs.batchSize = 1000 MyAgent.sinks.HDFS.hdfs.rollSize = 0 MyAgent.sinks.HDFS.hdfs.rollCount = 10000 # Describing/Configuring the channel MyAgent.channels.MemChannel.type = memory MyAgent.channels.MemChannel.capacity = 10000 MyAgent.channels.MemChannel.transactionCapacity = 100 # Binding the source and sink to the channel MyAgent.sources.MySource.channels = MemChannel MyAgent.sinks.HDFS.channel = MemChannel
- Then start the agent with the following command:
flume-ng agent \ --conf /etc/flume-ng/conf \ --conf-file custom.conf \ --name MyAgent \ -Dflume.root.logger=INFO,console
At this point I get a org.apache.flume.FlumeException, it seems it cannot find
com.cloudera.flume.source.MySource
From the library paths included when started the agent I can see the path /opt/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/flume-ng/lib where I copied MySouce.jar file, thus I don't understand why it cannot find the class.
What am I doing wrong?
ps: I am using CDH 5.13 installed by Cloudera Manager.
Created ‎05-22-2018 12:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> jar cvfm MySource.jar manifest.mf MySource.class
Your class is within a package (com.cloudera.flume.source) but the jar is loading them into the top level package. The ideal way would be to do this:
~> mkdir -p com/cloudera/flume/source/
~> mv MySource.class com/cloudera/flume/source/
~> jar cvf MySource.jar com/cloudera/flume/source/MySource.class
Doing the above steps within your sequence would ensure the class gets placed in the declared package instead of at the top level.
More generally, you can avoid these forms of trivial packaging mistakes by using a formal build tool/system such as Maven, or even IDEs such as IntelliJ or Eclipse which allow archive building from source projects. These package jars for you in the required form, maintaining namespaces perfectly among several other benefits.
Created ‎05-22-2018 12:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I feel this is something to do with the way you have created the jar .Do you see the jar when extracted having the same package as shown in the error message?
Also, try adding the full path for the class in the flume config.
Created ‎05-22-2018 12:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> jar cvfm MySource.jar manifest.mf MySource.class
Your class is within a package (com.cloudera.flume.source) but the jar is loading them into the top level package. The ideal way would be to do this:
~> mkdir -p com/cloudera/flume/source/
~> mv MySource.class com/cloudera/flume/source/
~> jar cvf MySource.jar com/cloudera/flume/source/MySource.class
Doing the above steps within your sequence would ensure the class gets placed in the declared package instead of at the top level.
More generally, you can avoid these forms of trivial packaging mistakes by using a formal build tool/system such as Maven, or even IDEs such as IntelliJ or Eclipse which allow archive building from source projects. These package jars for you in the required form, maintaining namespaces perfectly among several other benefits.
Created ‎05-23-2018 01:26 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, I indeed end up using Maven and plugins.d folder on Flume. Forgot to update the topic, thank you guys for the help!
