Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

GetHDFS Missing block exception

avatar
Super Collaborator

Hi,

I am trying to connect to our HADOOP cluster from a new HDF Server outside of our HADOOP cluster.i am trying to do some simple tests before i move all my flows onto the server from old server.

I am experiencing issues when i try to access file using getHDFS..i copied the config files onto HDF server. both servers are using same kerberos KDC , so i am using same keytabs. here is the error message from applog.

i was told all the ports for HDFS,HIVE etc are open for communication between the 2 servers.do i need to change anything in the config files.?

2017-04-03 13:14:36,648 ERROR [Timer-Driven Process Thread-4] o.apache.nifi.processors.hadoop.GetHDFS org.apache.nifi.processor.exception.FlowFileAccessException: Failed to import data from org.apache.hadoop.hdfs.client.HdfsDataInputStream@389fa2c0 for StandardFlowFileRecord[uuid=ee9aa74d-21eb-45cd-b6d5-6440c1a95093,claim=,offset=0,name=7412824162032320,size=0] due to org.apache.nifi.processor.exception.FlowFileAccessException: Unable to create ContentClaim due to org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-306710789-172.16.3.5-1445707884245:blk_1075144155_1404007 file=/user/putarapasa/OCA_Nestac_XRef_Old.xlsx at org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession.java:2690) ~[na:na] at org.apache.nifi.processors.hadoop.GetHDFS.processBatchOfFiles(GetHDFS.java:369) [nifi-hdfs-processors-1.1.0.2.1.0.0-165.jar:1.1.0.2.1.0.0-165] at org.apache.nifi.processors.hadoop.GetHDFS.onTrigger(GetHDFS.java:315) [nifi-hdfs-processors-1.1.0.2.1.0.0-165.jar:1.1.0.2.1.0.0-165] at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) [nifi-api-1.1.0.2.1.0.0-165.jar:1.1.0.2.1.0.0-165] at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1099) [nifi-framework-core-1.1.0.2.1.0.0-165.jar:1.1.0.2.1.0.0-165] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) [nifi-framework-core-1.1.0.2.1.0.0-165.jar:1.1.0.2.1.0.0-165] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) [nifi-framework-core-1.1.0.2.1.0.0-165.jar:1.1.0.2.1.0.0-165] at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132) [nifi-framework-core-1.1.0.2.1.0.0-165.jar:1.1.0.2.1.0.0-165] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_112] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_112] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_112] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_112] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_112] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_112] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_112] Caused by:

org.apache.nifi.processor.exception.FlowFileAccessException: Unable to create ContentClaim due to org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-306710789-172.16.3.5-1445707884245:blk_1075144155_1404007 file=/user/putarapasa/OCA_Nestac_XRef_Old.xlsx at org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession.java:2683) ~[na:na] ... 14 common frames omitted Caused by:

org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-306710789-172.16.3.5-1445707884245:blk_1075144155_1404007 file=/user/putarapasa/OCA_Nestac_XRef_Old.xlsx at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:984) ~[hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:642) ~[hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882) ~[hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934) ~[hadoop-hdfs-2.7.3.jar:na] at java.io.DataInputStream.read(DataInputStream.java:100) ~[na:1.8.0_112] at org.apache.nifi.stream.io.StreamUtils.copy(StreamUtils.java:35) ~[nifi-utils-1.1.0.2.1.0.0-165.jar:1.1.0.2.1.0.0-165] at org.apache.nifi.controller.repository.FileSystemRepository.importFrom(FileSystemRepository.java:700) ~[na:na] at org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession.java:2680) ~[na:na]

4 REPLIES 4

avatar
Master Guru

Taking HDF out of the picture for a second, are you able retrieve the file (/user/putarapasa/OCA_Nestac_XRef_Old.xlsx) from HDFS using the command line?

avatar
Super Collaborator

@Bryan Bende

I didn't try from command line , but it is accessible from Hue. will try from command-line.

avatar
Master Guru

Ok, the best test would be to see if you could setup the HDFS client on one of the HDF servers and then retrieve it using the command line there. If that doesn't work it would narrow down the problem to something outside of NiFi, if that works then we need to think more why NiFi can't retrieve it 🙂

avatar
Super Collaborator

Looks like only the Name nodes are opened for connectivity to\from HDF server , the data nodes are not . We are trying to fix and will test after that.