Support Questions

Find answers, ask questions, and share your expertise

Maven config. for custom processor to Read/Write HDFS files

avatar
Super Collaborator

NiFi 1.2.0

I have a custom processor who pom files look as the following.

processor/pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor 
license agreements. See the NOTICE file distributed with this work for additional 
information regarding copyright ownership. The ASF licenses this file to 
You under the Apache License, Version 2.0 (the "License"); you may not use 
this file except in compliance with the License. You may obtain a copy of 
the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required 
by applicable law or agreed to in writing, software distributed under the 
License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS 
OF ANY KIND, either express or implied. See the License for the specific 
language governing permissions and limitations under the License. -->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.datalake</groupId>
<artifactId>CDCNiFi</artifactId>
<version>1.0-SNAPSHOT</version>
</parent>
<artifactId>nifi-NiFiCDCPoC-processors</artifactId>
<packaging>jar</packaging>
<dependencies>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-api</artifactId>
</dependency>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-utils</artifactId>
</dependency>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-dbcp-service-api</artifactId>
</dependency>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-processor-utils</artifactId>
</dependency>
<!-- Third-party -->
<!-- <dependency> <groupId>com.microsoft.sqlserver</groupId> <artifactId>mssql-jdbc</artifactId> 
<version>6.1.0.jre8</version> </dependency> -->
<dependency>
<groupId>org.codehaus.jackson</groupId>
<artifactId>jackson-mapper-asl</artifactId>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.8.7</version>
</dependency>
<!-- Testing & Cross-cutting concerns -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-mock</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<scope>test</scope>
</dependency> 

</dependencies>
</project>

nar/pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor 
license agreements. See the NOTICE file distributed with this work for additional 
information regarding copyright ownership. The ASF licenses this file to 
You under the Apache License, Version 2.0 (the "License"); you may not use 
this file except in compliance with the License. You may obtain a copy of 
the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required 
by applicable law or agreed to in writing, software distributed under the 
License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS 
OF ANY KIND, either express or implied. See the License for the specific 
language governing permissions and limitations under the License. -->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.datalake</groupId>
<artifactId>CDCNiFi</artifactId>
<version>1.0-SNAPSHOT</version>
</parent>
<artifactId>nifi-NiFiCDCPoC-nar</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>nar</packaging>
<properties>
<maven.javadoc.skip>true</maven.javadoc.skip>
<source.skip>true</source.skip>
</properties>
<dependencies>
<dependency>
<groupId>com..datalake</groupId>
<artifactId>nifi-NiFiCDCPoC-processors</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-standard-services-api-nar</artifactId>
<type>nar</type>
</dependency>
<!-- <dependency>
<groupId>com.microsoft.sqlserver</groupId>
<artifactId>mssql-jdbc</artifactId>
<version>6.1.0.jre8</version>
<scope>runtime</scope>
</dependency>-->
</dependencies>
</project>

Now, I wish to do some HDFS file operations from the processor like read/write a SequenceFile, retrieve files from a HDFS directory and so on. I had a look at the existing processors like PutHDFS which has some pom entries as follows :

<dependency>
            <groupId>org.apache.nifi</groupId>
            <artifactId>nifi-hadoop-utils</artifactId>
        </dependency>
        <dependency>
            <groupId>org.apache.nifi</groupId>
            <artifactId>nifi-flowfile-packager</artifactId>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <scope>provided</scope>
        </dependency>

In the lib dir. of NiFi, I could see files like nifi-hadoop-libraries-nar-1.2.0.nar, nifi-hadoop-nar-1.2.0.nar

The inclusion of above entries in my pom.xml didn't work, what shall I do to use the HDFS file system api within my custom processor ?

1 ACCEPTED SOLUTION

avatar
Master Guru

Usually (as is the case for the nifi-hadoop-bundle), the NAR depends on the nifi-hadoop-libraries-nar, as it provides the Hadoop libraries (such as the provided dependencies you have in your processor POM like hadoop-common), and its parent NAR is nifi-standard-services-api-nar (which you have in your NAR POM). Currently, NARs can only have one parent, so you wouldn't be able to depend on both the hadoop-libraries and standard-services-api NARs at the same time. Since the former depends on the latter, this works for your processor.

Try replacing the NAR POM dependency on nifi-standard-services-api-nar to nifi-hadoop-libraries-nar, this should provide all the classes/JARs/dependencies you need.

View solution in original post

1 REPLY 1

avatar
Master Guru

Usually (as is the case for the nifi-hadoop-bundle), the NAR depends on the nifi-hadoop-libraries-nar, as it provides the Hadoop libraries (such as the provided dependencies you have in your processor POM like hadoop-common), and its parent NAR is nifi-standard-services-api-nar (which you have in your NAR POM). Currently, NARs can only have one parent, so you wouldn't be able to depend on both the hadoop-libraries and standard-services-api NARs at the same time. Since the former depends on the latter, this works for your processor.

Try replacing the NAR POM dependency on nifi-standard-services-api-nar to nifi-hadoop-libraries-nar, this should provide all the classes/JARs/dependencies you need.