Member since
09-29-2015
871
Posts
723
Kudos Received
255
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4275 | 12-03-2018 02:26 PM | |
| 3224 | 10-16-2018 01:37 PM | |
| 4336 | 10-03-2018 06:34 PM | |
| 3194 | 09-05-2018 07:44 PM | |
| 2442 | 09-05-2018 07:31 PM |
06-27-2017
06:27 PM
Yes the message key is the same thing as record key. Ok so for your use case, you need more than just the same partition for all messages from a flow file. You want the same partition for all flow files from a given source. In that case, we should have a property in the processor for the partition that supports expression language, so then you could do something like... GetFile (source1) -> UpdateAttribute (set kafka.partition = 1) -> PublishKafka (partition = ${kafka.partition}) GetFile (source2) -> UpdateAttribute (set kafka.partition = 2) -> PublishKafka (partition = ${kafka.partition}) I'll clarify this on the JIRA. In the meantime you probably have a better chance of using PublishKafka_0_10 (the non-record version)... If you strip off the header before reaching this processor, then set the Message Demarcator for PublishKafka_0_10 to be a new-line, and set the key to ${filename}, you should get what you are looking for.
... View more
06-27-2017
05:06 PM
I think what we need is a way to control the partitioning independently of the message key... The message key is used on the broker side during compaction/age-off, and the latest record with a given message key will be retained. This would mean that all the lines of your CSV would be treated as if they were different versions of the same message, and at some point all of the records could be age-off except the latest one. I think what you would really want is to use the "id" field from your CSV as the message key, but then indicate to NiFi that all of these messages from this flow file should be sent to the same partition, which unfortunately doesn't currently exist. I created this JIRA to add that option: https://issues.apache.org/jira/browse/NIFI-4133
... View more
06-23-2017
02:11 PM
@Alvin Jin Also, I know you already implemented a custom service, but there is also some work here by one of the Apache NiFi committers: https://github.com/apache/nifi/pull/1938
... View more
06-23-2017
01:46 PM
1 Kudo
Site-To-Site does not do anything to the contents of your flow files, if you have 3 flow files then it transfers 3 flow files. That statement is saying that site-to-site is optimized for a continuous flow of large amounts of data, so if you run a test with 3 flow files, it probably will send all 3 flow files to only of the nodes in your cluster because it wasn't enough data to reach the point where it would start sending to the other nodes.
... View more
06-23-2017
01:43 PM
1 Kudo
1) Doesn't really matter, it just needs to be shared location that all nodes can access. 2) You can do this and it might work well for small files and small amounts of files, but typically the whole point is to perform the "fetch" in parallel, where as here all the files have to be fetched on primary node (GetFile) and then all of their contents have to be redistributed to the cluster, instead of just the listings.
... View more
06-22-2017
05:05 PM
Generally you only want Primary Node only for a source processor like ListHDFS where you only want to perform the listing one time.
... View more
06-22-2017
05:04 PM
No, you have a 3 node cluster, lets say node #1 is primary node... MiNiFi is sending data to all nodes so the data is already divided across all the nodes, but you are only scheduled to process it on node #1, so now data on nodes #2 and #3 will just sit there and never get processed.
... View more
06-22-2017
04:56 PM
Your SplitText processor is scheduled to run on Primary Node only which doesn't seem right. MiNiFi would send data to all nodes. Most likely the flow files that are sitting there are not on the primary node, which you can determine by doing a List Queue on that connection and looking at the host column on the right.
... View more
06-22-2017
02:48 PM
Your custom NAR needs to have a NAR dependency in the pom.xml on the standard services API: <dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-standard-services-api-nar</artifactId>
<type>nar</type>
</dependency> If you can share your custom NAR code or pom files I can take a look.
... View more
06-21-2017
08:48 PM
2 Kudos
Since binary concatenation is just writing chucks of raw bytes one after another, there is no real format that can be understood to undo it. There would be no way for another processor to read those bytes and know where it was merged together. If you use a demarcator when merging, then you can use that to unmerge by using SplitContent or SplitText.
... View more