Support Questions

Find answers, ask questions, and share your expertise

Getting erro for GetHDFS nifi processor

avatar

I am trying to get files from HDFS directory with apache nifi GetHDFS processor; however, I am getting the error in the nifi-app.log whenever I try to run the job

Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs://XXX:8020/user/sample_b.csv, expected: file:///

at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:649)

Any one has any idea what is causing the error?

1 ACCEPTED SOLUTION

avatar
Master Guru

@Pallavi Ab,

I think issue is with hdfs-site.xml and core-site.xml

Use the xml's from /usr/hdp/2.4.2.0-258/hadoop/conf instead of /usr/hdp/2.4.2.0-258/etc/hadoop/conf.empty directory

/usr/hdp/2.4.2.0-258/hadoop/conf/hdfs-site.xml
/usr/hdp/2.4.2.0-258/hadoop/conf/core-site.xml

Copy them to another directory and try to use them in the Hadoop configuration resources property in GetHDFS processor.

View solution in original post

6 REPLIES 6

avatar
Master Guru
@Pallavi Ab

once make sure your file is in the directory and Nifi has permissions to the directory.

I am not sure about your get hdfs configurations, take a look on the below configs and configure your processor same configs in the screenshot shown below.

Configs:-

42835-gethdfs.png

Important Property is Keep source file configure this property as per your needs.

Keep Source Filefalse
  • true
  • false
Determines whether to delete the file from HDFS after it has been successfully transferred. If true, the file will be fetched repeatedly. This is intended for testing only.

avatar

@Shu

Thank you for quick response...I made the changes as suggested by you but now the processor is failing with below error:

Caused by: java.io.IOException: PropertyDescriptor PropertyDescriptor[Directory] has invalid value /user/cmor/kinetica/files/sample_b.csv. The directory does not exist.

Here is how the configuration of the processor looks:

gethdfs.png

avatar
Master Guru

@Pallavi Ab

As per your Logs

Caused by: java.io.IOException: PropertyDescriptor PropertyDescriptor[Directory] has invalid value /user/cmor/kinetica/files/sample_b.csv.The directory does not exist.

Can you check is the above directory exists in HDFS by using below command

bash# hdfs dfs -test -d /user/cmor/kinetica/files
bash# echo $?
bash# hdfs dfs -test -e /user/cmor/kinetica/files/sample_b.csv
bash# echo $?
//if echo returns 0 file or directory exists
//if echo returns 1 file or directory exists

Make sure the path in the Directory property is correct and run the processor again.

Usage of hdfs test command

bash# hdfs dfs -test -[defsz] <hdfs-path>
Options:<br>-d: f the path is a directory, return 0.<br>-e: if the path exists, return 0.<br>-f: if the path is a file, return 0.<br>-s: if the path is not empty, return 0.<br>-z: if the file is zero length, return 0.

avatar

@Shu

This is what I see after running the commands

42838-hdfsoutput.png

avatar
Master Guru

@Pallavi Ab,

I think issue is with hdfs-site.xml and core-site.xml

Use the xml's from /usr/hdp/2.4.2.0-258/hadoop/conf instead of /usr/hdp/2.4.2.0-258/etc/hadoop/conf.empty directory

/usr/hdp/2.4.2.0-258/hadoop/conf/hdfs-site.xml
/usr/hdp/2.4.2.0-258/hadoop/conf/core-site.xml

Copy them to another directory and try to use them in the Hadoop configuration resources property in GetHDFS processor.

avatar

Thank you @Shu; it is running now.