Created 03-08-2017 04:20 PM
Hi all! I am trying to develop a custom processor in NiFi which writes directly orc files into a remote hadoop cluster. In order to write them, I am using the orc core api. I have tried writing orc files on the local FS and everything is fine so far (hive, which is their "final destination", has no problem in reading them).
The issue is that, while trying to create a Writer object, I get a NoClassDefFoundError on org.apache.hadoop.hdfs.DistributedFileSystem.
That's the code used:
Configuration conf = new Configuration(); conf.addResource(new Path(hadoopConfigurationPath+"/core-site.xml")); conf.addResource(new Path(hadoopConfigurationPath+"/hdfs-site.xml")); conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()); String hdfsUri = conf.get("fs.default.name"); ... try{ writer = OrcFile.createWriter(new Path(hdfsUri+"/"+filename+".orc"), OrcFile.writerOptions(conf).setSchema(orcSchema)); } catch(IOException e){ log.error("Cannot open hdfs file. Reason: "+e.getMessage()); session.transfer(flowfile, hdfsFailure); return; } ...
I've copied the hadoop-hdfs jar in the lib directory, and I tried looking runtime at the jar loaded in the classpath using ClassLoader: the jar with the required class can be seen. Including the jar in the processor dependencies does not solve the issue too.
Any suggestion on how to get rid of this error is really appreciated.
Thank you all!
Created 03-15-2017 12:46 PM
Is your processor in its own NAR, or have you added it to a NiFi NAR (such as the nifi-hive-bundle or nifi-hdfs-bundle)? If the former, have you added the nifi-hadoop-libraries NAR as a parent to your NAR? This will give you access to the Hadoop JARs/classes via a parent classloader. To add this NAR as a parent, add the following to the <dependencies> section in your custom processor's NAR module (not the processor module itself):
<dependency> <groupId>org.apache.nifi</groupId> <artifactId>nifi-hadoop-libraries-nar</artifactId> <type>nar</type> </dependency>
Can you describe your use case a little more? If your files are already in ORC format, you should be able to use PutHDFS to place them onto the Hadoop cluster. If they are in some other format, you might be able to use some conversion processors (including ConvertAvroToORC) and then PutHDFS to land the resultant ORC files into the cluster.
Created 03-15-2017 12:46 PM
Is your processor in its own NAR, or have you added it to a NiFi NAR (such as the nifi-hive-bundle or nifi-hdfs-bundle)? If the former, have you added the nifi-hadoop-libraries NAR as a parent to your NAR? This will give you access to the Hadoop JARs/classes via a parent classloader. To add this NAR as a parent, add the following to the <dependencies> section in your custom processor's NAR module (not the processor module itself):
<dependency> <groupId>org.apache.nifi</groupId> <artifactId>nifi-hadoop-libraries-nar</artifactId> <type>nar</type> </dependency>
Can you describe your use case a little more? If your files are already in ORC format, you should be able to use PutHDFS to place them onto the Hadoop cluster. If they are in some other format, you might be able to use some conversion processors (including ConvertAvroToORC) and then PutHDFS to land the resultant ORC files into the cluster.
Created 03-28-2017 10:51 AM
Hi @Matt Burgess, thank you for the reply. Sorry to answer this late, but I clearly missed your post. Just in case someone runs into this, I wish to link a mirror question on S.O. that can be useful.