Support Questions

Find answers, ask questions, and share your expertise

NiFi - custom orc processor gives NoClassDefFoundError on org.apache.hadoop.hdfs.FistributedFileSystem

avatar
Contributor

Hi all! I am trying to develop a custom processor in NiFi which writes directly orc files into a remote hadoop cluster. In order to write them, I am using the orc core api. I have tried writing orc files on the local FS and everything is fine so far (hive, which is their "final destination", has no problem in reading them).

The issue is that, while trying to create a Writer object, I get a NoClassDefFoundError on org.apache.hadoop.hdfs.DistributedFileSystem.

That's the code used:

Configuration conf = new Configuration();
conf.addResource(new Path(hadoopConfigurationPath+"/core-site.xml"));
conf.addResource(new Path(hadoopConfigurationPath+"/hdfs-site.xml"));
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
String hdfsUri = conf.get("fs.default.name");

...

try{
    writer = OrcFile.createWriter(new Path(hdfsUri+"/"+filename+".orc"), OrcFile.writerOptions(conf).setSchema(orcSchema));
}
catch(IOException e){
    log.error("Cannot open hdfs file. Reason: "+e.getMessage());
    session.transfer(flowfile, hdfsFailure);
    return;
}
...

I've copied the hadoop-hdfs jar in the lib directory, and I tried looking runtime at the jar loaded in the classpath using ClassLoader: the jar with the required class can be seen. Including the jar in the processor dependencies does not solve the issue too.

Any suggestion on how to get rid of this error is really appreciated.

Thank you all!

1 ACCEPTED SOLUTION

avatar
Master Guru

Is your processor in its own NAR, or have you added it to a NiFi NAR (such as the nifi-hive-bundle or nifi-hdfs-bundle)? If the former, have you added the nifi-hadoop-libraries NAR as a parent to your NAR? This will give you access to the Hadoop JARs/classes via a parent classloader. To add this NAR as a parent, add the following to the <dependencies> section in your custom processor's NAR module (not the processor module itself):

<dependency>
   <groupId>org.apache.nifi</groupId>
   <artifactId>nifi-hadoop-libraries-nar</artifactId>
   <type>nar</type>
</dependency>

Can you describe your use case a little more? If your files are already in ORC format, you should be able to use PutHDFS to place them onto the Hadoop cluster. If they are in some other format, you might be able to use some conversion processors (including ConvertAvroToORC) and then PutHDFS to land the resultant ORC files into the cluster.

View solution in original post

2 REPLIES 2

avatar
Master Guru

Is your processor in its own NAR, or have you added it to a NiFi NAR (such as the nifi-hive-bundle or nifi-hdfs-bundle)? If the former, have you added the nifi-hadoop-libraries NAR as a parent to your NAR? This will give you access to the Hadoop JARs/classes via a parent classloader. To add this NAR as a parent, add the following to the <dependencies> section in your custom processor's NAR module (not the processor module itself):

<dependency>
   <groupId>org.apache.nifi</groupId>
   <artifactId>nifi-hadoop-libraries-nar</artifactId>
   <type>nar</type>
</dependency>

Can you describe your use case a little more? If your files are already in ORC format, you should be able to use PutHDFS to place them onto the Hadoop cluster. If they are in some other format, you might be able to use some conversion processors (including ConvertAvroToORC) and then PutHDFS to land the resultant ORC files into the cluster.

avatar
Contributor

Hi @Matt Burgess, thank you for the reply. Sorry to answer this late, but I clearly missed your post. Just in case someone runs into this, I wish to link a mirror question on S.O. that can be useful.