Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Flume custom source

avatar
Expert Contributor

Hi everyone, I am trying to use a custom source in Flume but when I start the agent I get a 

org.apache.flume.FlumeException: Unable to load source type: com.cloudera.flume.source.MySource, class: com.cloudera.flume.source.MySource

 

The custom source is specified in the MySource.java file, the package is com.cloudera.flume.source.

This is what I did:

 

  1. Compile the java file passing the class path to flume and hadoop libraries (this generates the MySource.class file):
    javac -cp /opt/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/hadoop/*:/opt/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/hadoop-mapreduce/*:/opt/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/flume-ng/lib/* MySource.java -Xlint
  2. Create manifest.mf file like the following:
    Manifest-Version: 1.0
    Main-Class: com.cloudera.flume.source.MySource
  3. Generate the MySource.jar file:
    jar cvfm MySource.jar manifest.mf MySource.class
  4. Move the MySource.jar file in the flume library folder:
    sudo mv MySource.jar /opt/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/flume-ng/lib
  5. The custom flume configuration file is the following:
    # custom.conf
    
    # Naming the components on the current agent. 
    MyAgent.sources = MySource 
    MyAgent.channels = MemChannel 
    MyAgent.sinks = HDFS
      
    # Describing/Configuring the source 
    MyAgent.sources.MySource.type = com.cloudera.flume.source.MySource
      
    # Describing/Configuring the sink 
    MyAgent.sinks.HDFS.type = hdfs 
    MyAgent.sinks.HDFS.hdfs.path = /test/flume/mysource-logs
    MyAgent.sinks.HDFS.hdfs.fileType = DataStream 
    MyAgent.sinks.HDFS.hdfs.writeFormat = Text 
    MyAgent.sinks.HDFS.hdfs.batchSize = 1000
    MyAgent.sinks.HDFS.hdfs.rollSize = 0 
    MyAgent.sinks.HDFS.hdfs.rollCount = 10000 
     
    # Describing/Configuring the channel 
    MyAgent.channels.MemChannel.type = memory 
    MyAgent.channels.MemChannel.capacity = 10000 
    MyAgent.channels.MemChannel.transactionCapacity = 100
      
    # Binding the source and sink to the channel 
    MyAgent.sources.MySource.channels = MemChannel
    MyAgent.sinks.HDFS.channel = MemChannel 
  6. Then start the agent with the following command:
    flume-ng agent \
    --conf /etc/flume-ng/conf \
    --conf-file custom.conf \
    --name MyAgent \
    -Dflume.root.logger=INFO,console

At this point I get a org.apache.flume.FlumeException, it seems it cannot find  

com.cloudera.flume.source.MySource

 

From the library paths included when started the agent I can see the path /opt/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/flume-ng/lib where I copied MySouce.jar file, thus I don't understand why it cannot find the class.

What am I doing wrong?

 

ps: I am using CDH 5.13 installed by Cloudera Manager.

 

1 ACCEPTED SOLUTION

avatar
Mentor
@Smitha is right here. The below step specifically is incorrect.

> jar cvfm MySource.jar manifest.mf MySource.class

Your class is within a package (com.cloudera.flume.source) but the jar is loading them into the top level package. The ideal way would be to do this:

~> mkdir -p com/cloudera/flume/source/
~> mv MySource.class com/cloudera/flume/source/
~> jar cvf MySource.jar com/cloudera/flume/source/MySource.class

Doing the above steps within your sequence would ensure the class gets placed in the declared package instead of at the top level.

More generally, you can avoid these forms of trivial packaging mistakes by using a formal build tool/system such as Maven, or even IDEs such as IntelliJ or Eclipse which allow archive building from source projects. These package jars for you in the required form, maintaining namespaces perfectly among several other benefits.

View solution in original post

3 REPLIES 3

avatar
New Contributor

I feel this is something to do with the way you have created the jar .Do you see the jar when extracted having the same package as shown in the error message?

Also, try adding the full path for the class in the flume config.

 

 

avatar
Mentor
@Smitha is right here. The below step specifically is incorrect.

> jar cvfm MySource.jar manifest.mf MySource.class

Your class is within a package (com.cloudera.flume.source) but the jar is loading them into the top level package. The ideal way would be to do this:

~> mkdir -p com/cloudera/flume/source/
~> mv MySource.class com/cloudera/flume/source/
~> jar cvf MySource.jar com/cloudera/flume/source/MySource.class

Doing the above steps within your sequence would ensure the class gets placed in the declared package instead of at the top level.

More generally, you can avoid these forms of trivial packaging mistakes by using a formal build tool/system such as Maven, or even IDEs such as IntelliJ or Eclipse which allow archive building from source projects. These package jars for you in the required form, maintaining namespaces perfectly among several other benefits.

avatar
Expert Contributor

Thanks, I indeed end up using Maven and plugins.d folder on Flume. Forgot to update the topic, thank you guys for the help!