Support Questions

Find answers, ask questions, and share your expertise

PutHDFS processor fails to write to kerberised and TLS/SSL enabled HDFS

avatar
Explorer

Hello,

 

I get the below error message in my NiFi logs when I tried to write a file to HDFS, but when i try to write a file with in the hadoop cluster it works fine, but from NiFi it fails with the below message.

My Nifi service is started by root and, when I have a local NiFi instance i am able to write the file to HDFS, where as from the NiFi cluster am unable to do so, any help would be highly appreciated, I am trying another solution, if that works then I will post it over here.

 

 

2020-08-10 13:41:59,519 INFO [NiFi Web Server-32056] o.a.n.c.s.StandardProcessScheduler Starting LogMessage[id=b4bc6d2c-0173-1000-0000-00002905a41b]
2020-08-10 13:41:59,519 INFO [NiFi Web Server-32056] o.a.n.controller.StandardProcessorNode Starting LogMessage[id=b4bc6d2c-0173-1000-0000-00002905a41b]
2020-08-10 13:41:59,519 INFO [NiFi Web Server-32056] o.a.n.c.s.StandardProcessScheduler Starting LogMessage[id=b4bd264b-0173-1000-0000-000018f91304]
2020-08-10 13:41:59,519 INFO [NiFi Web Server-32056] o.a.n.controller.StandardProcessorNode Starting LogMessage[id=b4bd264b-0173-1000-0000-000018f91304]
2020-08-10 13:41:59,519 INFO [NiFi Web Server-32056] o.a.n.c.s.StandardProcessScheduler Starting GetFile[id=b4d14ae8-0173-1000-ffff-ffffe680a6a0]
2020-08-10 13:41:59,519 INFO [NiFi Web Server-32056] o.a.n.controller.StandardProcessorNode Starting GetFile[id=b4d14ae8-0173-1000-ffff-ffffe680a6a0]
2020-08-10 13:41:59,519 INFO [NiFi Web Server-32056] o.a.n.c.s.StandardProcessScheduler Starting PutHDFS[id=4d34342b-2901-125d-917f-567e466964c8]
2020-08-10 13:41:59,519 INFO [NiFi Web Server-32056] o.a.n.controller.StandardProcessorNode Starting PutHDFS[id=4d34342b-2901-125d-917f-567e466964c8]
2020-08-10 13:41:59,519 INFO [Timer-Driven Process Thread-6] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled GetFile[id=b4d14ae8-0173-1000-ffff-ffffe680a6a0] to run with 1 threads
2020-08-10 13:41:59,519 INFO [Timer-Driven Process Thread-2] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled LogMessage[id=b4bc6d2c-0173-1000-0000-00002905a41b] to run with 1 threads
2020-08-10 13:41:59,519 INFO [Timer-Driven Process Thread-5] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled LogMessage[id=b4bd264b-0173-1000-0000-000018f91304] to run with 1 threads
2020-08-10 13:41:59,543 INFO [Timer-Driven Process Thread-10] o.a.hadoop.security.UserGroupInformation Login successful for user abc@UX.xyzCORP.NET using keytab file /home/abc/confFiles/abc.keytab
2020-08-10 13:41:59,544 INFO [Timer-Driven Process Thread-10] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled PutHDFS[id=4d34342b-2901-125d-917f-567e466964c8] to run with 1 threads
2020-08-10 13:41:59,595 INFO [Thread-9481] o.a.h.h.p.d.sasl.SaslDataTransferClient SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-08-10 13:41:59,599 INFO [Thread-9481] org.apache.hadoop.hdfs.DataStreamer Exception in createBlockOutputStream blk_1075334640_1594409
java.io.EOFException: null
        at java.io.DataInputStream.readByte(DataInputStream.java:267)
        at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
        at org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:240)
        at org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFields(BlockTokenIdentifier.java:221)
        at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:200)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:530)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:342)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:276)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:245)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:203)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:193)
        at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1731)
        at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1679)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
2020-08-10 13:41:59,599 WARN [Thread-9481] org.apache.hadoop.hdfs.DataStreamer Abandoning BP-1824237254-0.00.64.55-1545405130172:blk_1075334640_1594409
2020-08-10 13:41:59,601 WARN [Thread-9481] org.apache.hadoop.hdfs.DataStreamer Excluding datanode DatanodeInfoWithStorage[0.00.64.57:50010,DS-d6f56418-6e18-4317-a8ec-4a5b15757728,DISK]
2020-08-10 13:41:59,605 INFO [Thread-9481] o.a.h.h.p.d.sasl.SaslDataTransferClient SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-08-10 13:41:59,606 INFO [Thread-9481] org.apache.hadoop.hdfs.DataStreamer Exception in createBlockOutputStream blk_1075334641_1594410
java.io.EOFException: null
        at java.io.DataInputStream.readByte(DataInputStream.java:267)
        at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
        at org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:240)
        at org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFields(BlockTokenIdentifier.java:221)
        at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:200)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:530)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:342)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:276)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:245)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:203)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:193)
        at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1731)
        at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1679)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
2020-08-10 13:41:59,606 WARN [Thread-9481] org.apache.hadoop.hdfs.DataStreamer Abandoning BP-1824237254-0.00.64.55-1545405130172:blk_1075334641_1594410
2020-08-10 13:41:59,608 WARN [Thread-9481] org.apache.hadoop.hdfs.DataStreamer Excluding datanode DatanodeInfoWithStorage[0.00.64.56:50010,DS-286b28e8-d035-4b8c-a2dd-aabb08666234,DISK]
2020-08-10 13:41:59,612 INFO [Thread-9481] o.a.h.h.p.d.sasl.SaslDataTransferClient SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-08-10 13:41:59,612 INFO [Thread-9481] org.apache.hadoop.hdfs.DataStreamer Exception in createBlockOutputStream blk_1075334642_1594411
java.io.EOFException: null
        at java.io.DataInputStream.readByte(DataInputStream.java:267)
        at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
        at org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:240)
        at org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFields(BlockTokenIdentifier.java:221)
        at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:200)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:530)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:342)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:276)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:245)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:203)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:193)
        at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1731)
        at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1679)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
2020-08-10 13:41:59,612 WARN [Thread-9481] org.apache.hadoop.hdfs.DataStreamer Abandoning BP-1824237254-0.00.64.55-1545405130172:blk_1075334642_1594411
2020-08-10 13:41:59,614 WARN [Thread-9481] org.apache.hadoop.hdfs.DataStreamer Excluding datanode DatanodeInfoWithStorage[0.00.64.58:50010,DS-53536364-33f4-40d6-85c2-508abf7ff023,DISK]
2020-08-10 13:41:59,618 INFO [Thread-9481] o.a.h.h.p.d.sasl.SaslDataTransferClient SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-08-10 13:41:59,619 INFO [Thread-9481] org.apache.hadoop.hdfs.DataStreamer Exception in createBlockOutputStream blk_1075334643_1594412
java.io.EOFException: null
        at java.io.DataInputStream.readByte(DataInputStream.java:267)
        at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
        at org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:240)
        at org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFields(BlockTokenIdentifier.java:221)
        at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:200)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:530)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:342)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:276)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:245)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:203)
        at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:193)
        at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1731)
        at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1679)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
2020-08-10 13:41:59,619 WARN [Thread-9481] org.apache.hadoop.hdfs.DataStreamer Abandoning BP-1824237254-0.00.64.55-1545405130172:blk_1075334643_1594412
2020-08-10 13:41:59,621 WARN [Thread-9481] org.apache.hadoop.hdfs.DataStreamer Excluding datanode DatanodeInfoWithStorage[0.00.64.84:50010,DS-abba7d97-925a-4299-af86-b58fef9aaa12,DISK]
2020-08-10 13:41:59,621 WARN [Thread-9481] org.apache.hadoop.hdfs.DataStreamer DataStreamer Exception
java.io.IOException: Unable to create new block.
        at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
2020-08-10 13:41:59,621 WARN [Thread-9481] org.apache.hadoop.hdfs.DataStreamer Could not get block locations. Source file "/user/abc/puthdfs_test/.test.txt" - Aborting...block==null
2020-08-10 13:41:59,626 ERROR [Timer-Driven Process Thread-2] o.apache.nifi.processors.hadoop.PutHDFS PutHDFS[id=4d34342b-2901-125d-917f-567e466964c8] Failed to write to HDFS due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutHDFS[id=4d34342b-2901-125d-917f-567e466964c8]: java.io.IOException: Could not get block locations. Source file "/user/abc/puthdfs_test/.test.txt" - Aborting...block==null: org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutHDFS[id=4d34342b-2901-125d-917f-567e466964c8]: java.io.IOException: Could not get block locations. Source file "/user/abc/puthdfs_test/.test.txt" - Aborting...block==null
org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutHDFS[id=4d34342b-2901-125d-917f-567e466964c8]: java.io.IOException: Could not get block locations. Source file "/user/abc/puthdfs_test/.test.txt" - Aborting...block==null
        at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2347)
        at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2292)
        at org.apache.nifi.processors.hadoop.PutHDFS$1.run(PutHDFS.java:320)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:360)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
        at org.apache.nifi.processors.hadoop.PutHDFS.onTrigger(PutHDFS.java:250)
        at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
        at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1176)
        at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:213)
        at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
        at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Could not get block locations. Source file "/user/abc/puthdfs_test/.test.txt" - Aborting...block==null

 

  

1 ACCEPTED SOLUTION

avatar
Explorer

Hi,

NiFi v1.11.4 is running on Hadoop client  v3.2.1. There is a known issue of EOF Exception when connecting to Cloudera based on Hadoop v2.x. - https://issues.apache.org/jira/browse/HDFS-15191

Reverting NiFi to v1.11.3 which is based on Hadoop client v3.2.0 will not have this issue.

NiFi latest v1.12.0 is also running on Hadoop client  v3.2.1. so it has the same issue as NiFi v1.11.4

 

View solution in original post

22 REPLIES 22

avatar
Master Guru

Can you post configuration information, more logs, plus some details on:

 

NiFi version

HDFS version

Encrypted file system?

Cloud?

OS version

JDK version / JVM version

CDH/HDP/CDP version

PutHDFS Settings

What kind of data?  Example data

avatar
Explorer

Hi Timothy,

 

Here are the details... Please let me know what log information are you looking for?

 

NiFi version -  1.11.4

HDFS version - 2.6.0+cdh5.16.2+2863

Encrypted file system? - No KMS, HDFS - Kerberos & SSL Enabled

Cloud? - VM -  On Prem

OS version - RHEL 7

JDK version / JVM version - jdk1.8.0_162

CDH/HDP/CDP version - CDH 5.16.2

PutHDFS Settings - 

            1) Hadoop Configuration Resources - hdfs-site.xml, core-site.xml

            2) Kerberos Credentials Service - Keytab location - keytab placed on all nodes in the same path

            3) Directory - HDFS - /path/to/folder

            4) Conflict Resolution Strategy - Replace

What kind of data?  Example data, - A simple text file

avatar
Master Guru

=== 
There are a number of possible of causes for this.
The NameNode may be overloaded. Check the logs for messages that say "discarding calls..."
There may not be enough (any) DataNode nodes running for the data to be written. Again, check the logs.
Every DataNode on which the blocks were stored might be down (or not connected to the NameNode; it is impossible to distinguish the two).

 

Looks like HDFS issues, can you post from Hue or HDFS command line

 

https://cwiki.apache.org/confluence/display/HADOOP2/TroubleShooting

 

avatar
Explorer

Hi Timothy,

The NameNode may be overloaded. Check the logs for messages that say "discarding calls..."

    - Name node is fine and works fine when I actually have a nifi instance running in my laptop and use the same work flow with same configurations am able to write the file successfully to the same cluster
There may not be enough (any) DataNode nodes running for the data to be written. Again, check the logs. - All datanodes are up and running, Will check the logs and get back to you
Every DataNode on which the blocks were stored might be down (or not connected to the NameNode; it is impossible to distinguish the two). - Am able to put file from command line.

 

avatar
Explorer

Hi Timothy,

 

Here is the dfs admin report

 

[hdfssuperuser@abc ~]$ hadoop dfsadmin -report

 

Configured Capacity: 8748844187648 (7.96 TB) Present Capacity: 8649609633023 (7.87 TB) DFS Remaining: 6585870316863 (5.99 TB) DFS Used: 2063739316160 (1.88 TB) DFS Used%: 23.86% Live datanodes (4): Name: 0.1.0.1:50010 Hostname: abc Rack: /default Decommission Status : Normal Configured Capacity: 2187211046912 (1.99 TB) DFS Used: 673284505773 (627.05 GB) Non DFS Used: 24917778259 (23.21 GB) DFS Remaining: 1488606111329 (1.35 TB) DFS Used%: 30.78% DFS Remaining%: 68.06% 

avatar
Master Guru

What kerberos ID are you logged into on your PC vs in the NiFi cluster.

 

Perhaps the NiFi kerberos ID does not have WRITE permissions to that directory or you are not logged into Kerberos on NiFi cluster.

 

People usually don't have the same kerberos credentials on their PC as on their cluster.    Should be a service user not a personal user.

avatar
Master Guru

Could be local security or file permissions or IPtables / Firewall between NiFi cluster and HDFS.

 

https://www.edureka.co/community/30977/hadoop-hdfs-exception-in-createblockoutputstream

 

Could be crypto libraries installed?

 

https://community.cloudera.com/t5/Support-Questions/NiFi-PutHDFS-Writing-Zero-Bytes-appears-to-be-cr...

 

If this NiFi installed via CDF/CFM Cloudera Manager?   If so, please open a support ticket.

avatar
Explorer

It's actually the same, however the NiFi service is being run by root and I use a service account principal and it's keytab in the PutHDFS processor, So do you say that "root" is not having sufficient privileges to write the file into HDFS?

avatar
Master Guru

Most people doing give root accounts HDFS accounts or kerberos accounts.   Depends on the environment.

 

Just trying to see what is different between environments.

 

If it's not user, permissions, firewall, version of NiFi, that's weird.

 

I have seen some odd setups of Linux where security intercepts somethings or prevents root users from doing certain things.   i have seen devops tools change permissions automatically in the background.

 

You can change PutHDFS to Primary node to have just one write to it to narrow down which machine in the cluster may have an issue.