Created 07-06-2020 11:32 PM
I am trying to write files to ADLS Gen2 from a Kerberized Nifi ( version 1.5). I have configured PutHDFS following instructions in https://community.cloudera.com/t5/Community-Articles/Connecting-to-Azure-Data-Lake-from-a-NiFi-dataf... and https://techcommunity.microsoft.com/t5/azure/how-can-i-use-nifi-to-ingest-data-from-to-adls/m-p/2635...
So I have configured the PutHDFS processor using the core-site.xml in the second link and using the jars recommended in both links. I cannot finish the configuration of the processor without giving the name and keytab of a principal, which I do not understand since I am trying to access the Azure datalake with the access key. When I add one of the principals we have I get the error in the attachment. Please see my processor properties:
Please advice
Created 07-07-2020 12:11 AM
It seems you are facing permission issue from HDFS end:
Permission denied: user=svcqhdfuser, access=EXECUTE, inode="/databank/test/from_nifi":hdfs:hdfs:
Could you please check if user=svcqhdfuser have permission to access /databank/test/from_nifi in hdfs.
-- Try to login as "svcqhdfuser" user:
# su - svcqhdfuser
-- Run the below command to confirm the same:
# hdfs dfs -ls /databank/test/from_nifi
-- Or try to read or write something under /databank/test/from_nifi from command line to confirm if user svcqhdfuser have required permission, if not then assign the permission and try again.
Created 07-09-2020 04:25 AM
Thank you. Yes this explained the error I got! But I have still the problem that the putHDFS processor seems to ignore the core-site.xml file I write in the processor properties and use the default core-site.xml file that point at our local kerberized hadoop cluster. Can there be some global configurations that can explain this behaviour?
I see the following in the log file:
2020-07-09 11:54:36,059 WARN [Timer-Driven Process Thread-1] org.apache.hadoop.conf.Configuration /etc/nifi/confhdi/core-site.xml:an attempt to override final parameter: fs.defaultFS; Ignoring.
Brgds,
Paz
Created 07-07-2020 05:20 AM
@pazufst
There are a couple different things going on here.
1. It is the contents of the core-site.xml file you are pointing your putHDFS processor at that drives the specific security requirements of the putHDFS processor. My guess is you will find kerberos setup within that core-site.xml file.
2. There are several Apache jiras for improvements to the the various AzureDataLakeStorage processors available within NiFi fo adding support for ADLS Gen 2. These jiras are marked as resolved for Apache NiFi 1.12 which has not yet been released.
https://issues.apache.org/jira/browse/NIFI-7259
https://issues.apache.org/jira/browse/NIFI-7334
https://issues.apache.org/jira/browse/NIFI-7336
https://issues.apache.org/jira/browse/NIFI-7340
Hope this helps,
Matt
Created on 07-09-2020 04:18 AM - edited 07-09-2020 04:26 AM
Hello,
Thank you! I attach a copy of my core-site-xml, that describes ADLS Gen2 access. I have double checked that I am pointing at these core-site.xml file in the processor properties, but I can see that the processor is now writing to our local kerberized hadoop cluster, instead of to the remote ADLS Gen2. So I am quite puzzled!!!
But I can see that I get the following message in the log:
2020-07-09 11:54:36,059 WARN [Timer-Driven Process Thread-1] org.apache.hadoop.conf.Configuration /etc/nifi/confhdi/core-site.xml:an attempt to override final parameter: fs.defaultFS; Ignoring.
Is there an explanation for this?
I will have look at your references and see if I get some light.
Brgds,
Paz
Created 07-09-2020 05:38 AM
@pazufst
You should make sure that you have both the core-site.xml and hdfs-site.xml files copied over to all your NiFi nodes. Then make sure that these files are owned and accessible by the NiFi service user so that NiFi can read them.
Here is another post that may help you more here:
https://community.cloudera.com/t5/Support-Questions/NiFi-Hadoop-Configuration-Error/td-p/225448
Hope this helps,
Matt
Created 07-09-2020 06:27 AM
Hello again,
Yes I realized that I had no hdfs-site.xml and I have now added the hdfs-site.xml file from the HDInsight that is connected to the ADLS Gen2. But I got the next problem:
2020-07-09 13:57:14,814 WARN [Timer-Driven Process Thread-10] org.apache.hadoop.hdfs.DFSUtil Namenode for mycluster remains unresolved for ID nn2. Check your hdfs-site.xml file to ensure namenodes are configured properly.
2020-07-09 13:57:14,814 ERROR [Timer-Driven Process Thread-10] o.apache.nifi.processors.hadoop.PutHDFS PutHDFS[id=436939a6-0fd4-1741-bb61-7913fdc19600] HDFS Configuration error - java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider: {}
So now I am wondering whether the problem is that Nifi is trying to resolve the HDInsight datanode names in the hdfs-site.xml and not getting any result, because they are internal Azure names. It would be nice to know, what are the properties required in hdfs-site.xml.
Brgds,
Paz
Created 07-11-2020 07:17 AM
Definitively, Nifi is ignoring the core-site.xml and hdfs-site.xml I put in the configuration. I have filtered the log for my flow and I get:
2020-07-11 16:13:49,123 INFO [NiFi Web Server-298078] o.a.n.c.s.StandardProcessScheduler Starting PutHDFS[id=436939a6-0fd4-1741-bb61-7913fdc19600]
2020-07-11 16:13:49,135 WARN [Timer-Driven Process Thread-5] org.apache.hadoop.conf.Configuration /etc/nifi/confhdi/core-site.xml:an attempt to override final parameter: fs.defaultFS; Ignoring.
2020-07-11 16:13:49,136 WARN [Timer-Driven Process Thread-5] org.apache.hadoop.conf.Configuration /etc/nifi/confhdi/hdfs-site.xml:an attempt to override final parameter: dfs.datanode.data.dir; Ignoring.
2020-07-11 16:13:49,136 WARN [Timer-Driven Process Thread-5] org.apache.hadoop.conf.Configuration /etc/nifi/confhdi/hdfs-site.xml:an attempt to override final parameter: dfs.datanode.failed.volumes.tolerated; Ignoring.
2020-07-11 16:13:49,136 WARN [Timer-Driven Process Thread-5] org.apache.hadoop.conf.Configuration /etc/nifi/confhdi/hdfs-site.xml:an attempt to override final parameter: dfs.namenode.name.dir; Ignoring.
2020-07-11 16:13:49,136 WARN [Timer-Driven Process Thread-5] org.apache.hadoop.conf.Configuration /etc/nifi/confhdi/hdfs-site.xml:an attempt to override final parameter: dfs.webhdfs.enabled; Ignoring.
2020-07-11 16:13:49,145 INFO [Timer-Driven Process Thread-5] o.a.hadoop.security.UserGroupInformation Login successful for user svcqhdfuser using keytab file /etc/nifi-resources/keytabs/svcqhdfuser.keytab
2020-07-11 16:14:04,750 WARN [Timer-Driven Process Thread-5] org.apache.hadoop.hdfs.DFSUtil Namenode for mycluster remains unresolved for ID nn1. Check your hdfs-site.xml file to ensure namenodes are configured properly.
2020-07-11 16:14:21,063 WARN [Timer-Driven Process Thread-5] org.apache.hadoop.hdfs.DFSUtil Namenode for mycluster remains unresolved for ID nn2. Check your hdfs-site.xml file to ensure namenodes are configured properly.
2020-07-11 16:14:21,063 ERROR [Timer-Driven Process Thread-5] o.apache.nifi.processors.hadoop.PutHDFS PutHDFS[id=436939a6-0fd4-1741-bb61-7913fdc19600] HDFS Configuration error - java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider: {}
2020-07-11 16:14:21,064 ERROR [Timer-Driven Process Thread-5] o.apache.nifi.processors.hadoop.PutHDFS PutHDFS[id=436939a6-0fd4-1741-bb61-7913fdc19600] Failed to properly initialize Processor. If still scheduled to run, NiFi will attempt to initialize and run the Processor again after the 'Administrative Yield Duration' has elapsed. Failure is due to java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException
2020-07-11 16:14:50,810 INFO [NiFi Web Server-298342] o.a.n.c.s.StandardProcessScheduler Stopping PutHDFS[id=436939a6-0fd4-1741-bb61-7913fdc19600]
2020-07-11 16:14:51,076 WARN [Timer-Driven Process Thread-2] org.apache.hadoop.conf.Configuration /etc/nifi/confhdi/core-site.xml:an attempt to override final parameter: fs.defaultFS; Ignoring.
2020-07-11 16:14:51,077 WARN [Timer-Driven Process Thread-2] org.apache.hadoop.conf.Configuration /etc/nifi/confhdi/hdfs-site.xml:an attempt to override final parameter: dfs.datanode.data.dir; Ignoring.
2020-07-11 16:14:51,077 WARN [Timer-Driven Process Thread-2] org.apache.hadoop.conf.Configuration /etc/nifi/confhdi/hdfs-site.xml:an attempt to override final parameter: dfs.datanode.failed.volumes.tolerated; Ignoring.
2020-07-11 16:14:51,077 WARN [Timer-Driven Process Thread-2] org.apache.hadoop.conf.Configuration /etc/nifi/confhdi/hdfs-site.xml:an attempt to override final parameter: dfs.namenode.name.dir; Ignoring.
2020-07-11 16:14:51,077 WARN [Timer-Driven Process Thread-2] org.apache.hadoop.conf.Configuration /etc/nifi/confhdi/hdfs-site.xml:an attempt to override final parameter: dfs.webhdfs.enabled; Ignoring.
Brgds,
Paz