Created 04-03-2017 06:59 PM
Hi,
I am trying to connect to our HADOOP cluster from a new HDF Server outside of our HADOOP cluster.i am trying to do some simple tests before i move all my flows onto the server from old server.
I am experiencing issues when i try to access file using getHDFS..i copied the config files onto HDF server. both servers are using same kerberos KDC , so i am using same keytabs. here is the error message from applog.
i was told all the ports for HDFS,HIVE etc are open for communication between the 2 servers.do i need to change anything in the config files.?
2017-04-03 13:14:36,648 ERROR [Timer-Driven Process Thread-4] o.apache.nifi.processors.hadoop.GetHDFS org.apache.nifi.processor.exception.FlowFileAccessException: Failed to import data from org.apache.hadoop.hdfs.client.HdfsDataInputStream@389fa2c0 for StandardFlowFileRecord[uuid=ee9aa74d-21eb-45cd-b6d5-6440c1a95093,claim=,offset=0,name=7412824162032320,size=0] due to org.apache.nifi.processor.exception.FlowFileAccessException: Unable to create ContentClaim due to org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-306710789-172.16.3.5-1445707884245:blk_1075144155_1404007 file=/user/putarapasa/OCA_Nestac_XRef_Old.xlsx at org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession.java:2690) ~[na:na] at org.apache.nifi.processors.hadoop.GetHDFS.processBatchOfFiles(GetHDFS.java:369) [nifi-hdfs-processors-1.1.0.2.1.0.0-165.jar:1.1.0.2.1.0.0-165] at org.apache.nifi.processors.hadoop.GetHDFS.onTrigger(GetHDFS.java:315) [nifi-hdfs-processors-1.1.0.2.1.0.0-165.jar:1.1.0.2.1.0.0-165] at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) [nifi-api-1.1.0.2.1.0.0-165.jar:1.1.0.2.1.0.0-165] at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1099) [nifi-framework-core-1.1.0.2.1.0.0-165.jar:1.1.0.2.1.0.0-165] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) [nifi-framework-core-1.1.0.2.1.0.0-165.jar:1.1.0.2.1.0.0-165] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) [nifi-framework-core-1.1.0.2.1.0.0-165.jar:1.1.0.2.1.0.0-165] at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132) [nifi-framework-core-1.1.0.2.1.0.0-165.jar:1.1.0.2.1.0.0-165] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_112] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_112] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_112] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_112] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_112] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_112] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_112] Caused by:
org.apache.nifi.processor.exception.FlowFileAccessException: Unable to create ContentClaim due to org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-306710789-172.16.3.5-1445707884245:blk_1075144155_1404007 file=/user/putarapasa/OCA_Nestac_XRef_Old.xlsx at org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession.java:2683) ~[na:na] ... 14 common frames omitted Caused by:
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-306710789-172.16.3.5-1445707884245:blk_1075144155_1404007 file=/user/putarapasa/OCA_Nestac_XRef_Old.xlsx at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:984) ~[hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:642) ~[hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882) ~[hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934) ~[hadoop-hdfs-2.7.3.jar:na] at java.io.DataInputStream.read(DataInputStream.java:100) ~[na:1.8.0_112] at org.apache.nifi.stream.io.StreamUtils.copy(StreamUtils.java:35) ~[nifi-utils-1.1.0.2.1.0.0-165.jar:1.1.0.2.1.0.0-165] at org.apache.nifi.controller.repository.FileSystemRepository.importFrom(FileSystemRepository.java:700) ~[na:na] at org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession.java:2680) ~[na:na]
Created 04-04-2017 02:19 PM
Taking HDF out of the picture for a second, are you able retrieve the file (/user/putarapasa/OCA_Nestac_XRef_Old.xlsx) from HDFS using the command line?
Created 04-04-2017 02:30 PM
I didn't try from command line , but it is accessible from Hue. will try from command-line.
Created 04-04-2017 02:33 PM
Ok, the best test would be to see if you could setup the HDFS client on one of the HDF servers and then retrieve it using the command line there. If that doesn't work it would narrow down the problem to something outside of NiFi, if that works then we need to think more why NiFi can't retrieve it 🙂
Created 04-04-2017 03:41 PM
Looks like only the Name nodes are opened for connectivity to\from HDF server , the data nodes are not . We are trying to fix and will test after that.