Member since
02-22-2016
60
Posts
71
Kudos Received
27
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4410 | 07-14-2017 07:41 PM | |
1304 | 07-07-2017 05:04 PM | |
5014 | 07-07-2017 03:59 PM | |
905 | 07-06-2017 02:59 PM | |
2869 | 07-06-2017 02:55 PM |
06-20-2017
02:21 PM
1 Kudo
@Arsalan Siddiqi You might get some mileage out running NiFi in cluster mode (you don't say whether it's a single instance or not). I can't imagine being able to pull 22M records over the Spark receiver in a timely manner otherwise. Also, I'd use a cascade of Splits, breaking down the file in steps so 22M -> 1M -> 100k -> 1. This will help a lot with memory utilization. And if you haven't seen it before, there's a trick to re-balance FlowFiles across a cluster by doing a self site-to-site transmission with a RPG pointing at the current cluster. That said, for this kind of workload you will also be better served by delivering the data to Kafka and using the Kafka receiver. You might still need to consider a cluster for NiFi however. Last, make sure that you have chosen an appropriate batch interval on the Spark Streaming side. Start with something large, e.g., 10 secs, and work down from there when you're tuning your app.
... View more
06-20-2017
04:13 AM
3 Kudos
@Alvaro Muir Unfortunately InvokeHttp isn't currently built to transfer the chunks as individual FlowFiles for chunk transfer encoded responses. In lieu of that I think you have a few options: You can use ExecuteProcess with the Batch Duration set along with curl or httpie and just capture the output from shell command. You could create a scripted or custom processor. The trick for this one is going to be that you're going to have to have another thread reading the chunks off the long running request and feeding them to a process local queue. That way, the processor's job is to just check that queue and then transfer whatever chunks it sees at that time. To be specific, your @OnScheduled method could connect to the HTTP endpoint, read individual chunks, and push them onto a LinkedBlockingQueue. Then your @OnTrigger method could do a poll() or take() to see if any chunks are available, iterating through, creating a new FlowFile out the chunks and doing a session.transfer() for each. The GetTwitter processor is the prototypical example of this pattern. In it, the Hosebird client is setup in @OnScheduled to feed eventQueue and @OnTrigger then polls() eventQueue for the Tweets. You could just use curl or httpie to create files and have NiFi pick those up with GetFile. This is pretty silly but `http --stream <URL> | split -l 1` will actually create individual files out of each chunk.
... View more
06-14-2017
03:30 PM
2 Kudos
@Timothy Spann Try minifying the Avro schema source (i.e., removing all the spaces and newlines) or just all the leading spaces from each line. It should work then.
... View more
03-22-2017
02:20 PM
3 Kudos
@Alex Woolford There are a few things you can try (none of which are really NiFi concerns): iptables port redirection Run something like HAproxy to forward tcp traffic from 514 to the selected port in NiFi Use the cap_net_bind_service available in more recent linux kernels to allow the JVM to bind to privileged ports without running as root
... View more
11-22-2016
05:19 PM
1 Kudo
@Raj B I laid out some of the options in a recent mailing list discussion [1]. These include both of the suggestions above, including a link to a work-in-progress implementation for the ControllerService approach. 1. http://mail-archives.apache.org/mod_mbox/nifi-dev/201611.mbox/%3c31b8fe4f-95f6-4419-80a9-f9a728a9cb7c@me.com%3e
... View more
11-08-2016
06:31 PM
1 Kudo
@Gerard Alexander sliding() keeps track of the partition index, which in this case corresponds to the ordering of the unigrams. Compare rdd.mapPartitionsWithIndex { (i, p) => p.map { e => (i, e) } }.collect() and rdd.sliding(2).mapPartitionsWithIndex { (i, p) => p.map { e => (i, e) } }.collect() to help with the intuition.
... View more
11-08-2016
05:24 PM
@Ankit Jain There are a few ways you can do this: ReplaceText with a regex matching lines that start with PID You could SplitText -> RouteText matching lines that start with PID What you can't do though is extract the PID first and then run it through ExtractHL7Attributes. ExtractHL7Attributes requires a full, valid HL7 message. If you want to do that, you're best bet is to run ExtractHL7Attributes and then (re)create a new message from the created PID attribute values.
... View more
11-08-2016
05:24 PM
1 Kudo
@Ankit Jain In AttributesToJSON do you have the Destination property set to "flowfile-content"? If you don't then what it does is put the JSON in the JSONAttributes attribute and it leaves the FlowFile contents the same, in this case an HL7 document. An HL7 document of course isn't JSON and starts with MSH, so this is the error you'd see if you have Destination set to "flowfile-attribute" (the default) and not "flowfile-content".
... View more
11-08-2016
03:00 PM
@Raj B Is there any chance you can share an example of the JSON message and any stack trace that is output in nifi-app.log?
... View more
10-21-2016
10:03 PM
@Jeeva Jeeva You're probably best off posting that as another question (both to get it answered and so it's more searchable). I don't have anything in hand at the moment. Best.
... View more