Member since
09-29-2015
871
Posts
723
Kudos Received
255
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4263 | 12-03-2018 02:26 PM | |
| 3203 | 10-16-2018 01:37 PM | |
| 4310 | 10-03-2018 06:34 PM | |
| 3166 | 09-05-2018 07:44 PM | |
| 2425 | 09-05-2018 07:31 PM |
12-05-2016
06:07 PM
1 Kudo
Looks like this was answered on the apache mailing lists... @brosander created a MiNiFi JIRA: https://issues.apache.org/jira/browse/MINIFI-153
... View more
12-03-2016
02:27 PM
I suspect this is because Site-To-Site over HTTP with proxies was added in NiFi 1.0, and MiNiFi 0.0.1 was released before that, so the MiNiFi 0.0.1 toolkit doesn't know to look at these elements in the template when generating the YAML. The next release of MiNiFi Java (0.1.0) is going through the release vote right now, it would be interesting to try your scenario with that to see if there is still a problem. The artifacts for the release candidate are here if you would like to try it out: https://dist.apache.org/repos/dist/dev/nifi/nifi-minifi/0.1.0/
... View more
12-02-2016
02:28 PM
1 Kudo
If you send a header named "filename" it should respect that value for the filename attribute in NiFi: https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/servlets/ListenHTTPServlet.java#L243-L246 Basically HTTP headers become FlowFile attributes, and the body of the HTTP request becomes FlowFile content. Write now it looks like you are writing the filename into the body of the POST request.
... View more
12-02-2016
01:57 PM
@Avijeet Dash There is really no correct answer for the architecture because it depends on the specs of the servers and on the amount of data moving through the system and what is being done to each piece of data. I can say that for a high volume production scenario, you would probably not want to co-locate the services on each node. The biggest impact on performance will likely be making sure each NiFi repository (flow file, content, prov) has its own disks, and Kafka has its own disks, to avoid I/O contention. NiFi currently doesn't do data replication, but there it is being worked on by the community: https://cwiki.apache.org/confluence/display/NIFI/Data+Replication Even without data replication, typically you would have a RAID configuration for your repository disks on each of your NiFi nodes so it would have to be some kind of error that went beyond just a single disk failure. As long as you can get the node back up then all the data will be there.
... View more
12-01-2016
03:46 PM
@Avijeet Dash Thanks! It really depends on what you are trying to do. NiFi and Kafka are serving two different purposes... NiFi is a data flow tool meant to move data between systems and provide centralized management of the data flow. All of the data in NiFi is persisted to disk and it will survive restarts/crashes. Kafka provides a durable stream store with a decentralized publish and subscribe model, where consumers can manage their own offsets and reset them to replay data. NiFi is not trying to hold on to the data, it is trying to bring it somewhere and once it is delivered to the destination, then it is no longer in NiFi. Where as Kafka typically holds on to the data longer which is what allows for consumers to reset their offsets and replay data. Also if you have many downstream consumers that are all consuming the same data, it makes more sense for consumers to latch on to a Kafka topic. NiFi does offer the ability for consumer to pull data via site-to-site, but you would need to setup an Output Port in NiFi for each of these consumers to pull from. So some examples... If you are trying to ingest data to HDFS, then NiFi can do that by itself. If you are trying to provide data to 10s or 100s of streaming analytics, then putting the data in Kafka makes sense, you may still want/need NiFi to get your data into Kafka. If you have data sources that you want to get into Kafka, but they can't be changed to communicate with Kafka, then use NiFi to reach out and get the data from those systems and publish it to Kafka.
... View more
11-29-2016
01:31 AM
The attached file doesn't look like a normal Solr log, did that come from solr_home/server/logs/solr.log ?
... View more
11-22-2016
04:16 PM
When you say you "put Kafka into the schemas just in case" do you mean your MiNiFi flow has Kafka processors in it? If so I would try removing that because I don't think MiNiFi has the Kafka processors available by default.
... View more
11-22-2016
03:26 PM
And this article that was just posted is a great example by @Ryan Cicak of setting up the communication between MiNiFi and NiFi https://community.hortonworks.com/articles/67756/ingesting-log-data-using-minifi-nifi.html
... View more
11-22-2016
03:24 PM
I'm not really sure you need Kafka in this flow. I would consider: MiNiFi -> NiFi -> HDFS The way you connect MiNiFi to NiFi is through site-to-site using a Remote Process Group. See Joe Percivall's presentation here, slides 19 & 20: http://www.slideshare.net/hortonworks/hortonworks-data-in-motion-webinar-series-part-6-edge-intelligence-iot-minifi?ref=http://hortonworks.com/blog/edge-intelligence-iot-apache-minifi/ Are there additional errors in your MiNiFi log? It could be that one of the processors you used to create the flow in NiFi is not available in MiNiFi.
... View more