Member since
06-01-2017
3
Posts
0
Kudos Received
0
Solutions
06-01-2017
03:10 PM
Thanks @Matt Clarke Every zip could be around 200MB, and contains 40K flowfiles aprox. Is there any way the cluster can route the flowfiles to the Primary Node? This way, the primary node could be the only responsible of the HDFS writting. Alvaro
... View more
06-01-2017
01:44 PM
Hi there, I have a NiFi cluster with 4 nodes. The defined dataflow (image attached) has a MergeContent, which gather together incoming flowfiles into a zip (every minute), and a PutHDFS, which puts the zip file into HDFS. The result I was expecting was that only one zip file would be created in HDFS, with all the flowfiles of the last minute: Example: /example/2017-06-01/example_1202.zip The real result I got is that every node creates and tries to put its own zip file. Since the zip filename (set in the updateAttribute processor) is unique for the whole cluster, files try to overwrite themselves, and I get Error messages. I tried setting the Conflict Resolution Strategy property (within PutHDFS processor) to append, but I get the next error: PutHDFS[id=f696f-05b-100-9b2-51019b97c5] Failed to write to HDFS due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutHDFS[id=f6e6968f-015b-1000-95b2-510198b97c50]: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /example/2017-06-01/.example_1202.zip (inode 32117): File does not exist. Holder DFSClient_NONMAPREDUCE_-6309245_9 does not have any open files. The objective of the flow is that files received in any of the 4 nodes, are collected every minute, compressed into a .zip file, and put into HDFS. Is my dataflow not valid? Where is the mistake? Any idea of how to do it? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache NiFi