About bbende

bbende · ‎12-05-2016

Looks like this was answered on the apache mailing lists... @brosander created a MiNiFi JIRA: https://issues.apache.org/jira/browse/MINIFI-153

bbende · ‎12-03-2016

I suspect this is because Site-To-Site over HTTP with proxies was added in NiFi 1.0, and MiNiFi 0.0.1 was released before that, so the MiNiFi 0.0.1 toolkit doesn't know to look at these elements in the template when generating the YAML. The next release of MiNiFi Java (0.1.0) is going through the release vote right now, it would be interesting to try your scenario with that to see if there is still a problem. The artifacts for the release candidate are here if you would like to try it out: https://dist.apache.org/repos/dist/dev/nifi/nifi-minifi/0.1.0/

bbende · ‎12-02-2016

If you send a header named "filename" it should respect that value for the filename attribute in NiFi: https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/servlets/ListenHTTPServlet.java#L243-L246 Basically HTTP headers become FlowFile attributes, and the body of the HTTP request becomes FlowFile content. Write now it looks like you are writing the filename into the body of the POST request.

bbende · ‎12-02-2016

@Avijeet Dash There is really no correct answer for the architecture because it depends on the specs of the servers and on the amount of data moving through the system and what is being done to each piece of data. I can say that for a high volume production scenario, you would probably not want to co-locate the services on each node. The biggest impact on performance will likely be making sure each NiFi repository (flow file, content, prov) has its own disks, and Kafka has its own disks, to avoid I/O contention. NiFi currently doesn't do data replication, but there it is being worked on by the community: https://cwiki.apache.org/confluence/display/NIFI/Data+Replication Even without data replication, typically you would have a RAID configuration for your repository disks on each of your NiFi nodes so it would have to be some kind of error that went beyond just a single disk failure. As long as you can get the node back up then all the data will be there.

bbende · ‎12-01-2016

@Avijeet Dash Thanks! It really depends on what you are trying to do. NiFi and Kafka are serving two different purposes... NiFi is a data flow tool meant to move data between systems and provide centralized management of the data flow. All of the data in NiFi is persisted to disk and it will survive restarts/crashes. Kafka provides a durable stream store with a decentralized publish and subscribe model, where consumers can manage their own offsets and reset them to replay data. NiFi is not trying to hold on to the data, it is trying to bring it somewhere and once it is delivered to the destination, then it is no longer in NiFi. Where as Kafka typically holds on to the data longer which is what allows for consumers to reset their offsets and replay data. Also if you have many downstream consumers that are all consuming the same data, it makes more sense for consumers to latch on to a Kafka topic. NiFi does offer the ability for consumer to pull data via site-to-site, but you would need to setup an Output Port in NiFi for each of these consumers to pull from. So some examples... If you are trying to ingest data to HDFS, then NiFi can do that by itself. If you are trying to provide data to 10s or 100s of streaming analytics, then putting the data in Kafka makes sense, you may still want/need NiFi to get your data into Kafka. If you have data sources that you want to get into Kafka, but they can't be changed to communicate with Kafka, then use NiFi to reach out and get the data from those systems and publish it to Kafka.

bbende · ‎11-29-2016

The attached file doesn't look like a normal Solr log, did that come from solr_home/server/logs/solr.log ?

bbende · ‎11-28-2016

Can you start a new post describing your problem? Thanks.

bbende · ‎11-22-2016

When you say you "put Kafka into the schemas just in case" do you mean your MiNiFi flow has Kafka processors in it? If so I would try removing that because I don't think MiNiFi has the Kafka processors available by default.

bbende · ‎11-22-2016

And this article that was just posted is a great example by @Ryan Cicak of setting up the communication between MiNiFi and NiFi https://community.hortonworks.com/articles/67756/ingesting-log-data-using-minifi-nifi.html

bbende · ‎11-22-2016

I'm not really sure you need Kafka in this flow. I would consider: MiNiFi -> NiFi -> HDFS The way you connect MiNiFi to NiFi is through site-to-site using a Remote Process Group. See Joe Percivall's presentation here, slides 19 & 20: http://www.slideshare.net/hortonworks/hortonworks-data-in-motion-webinar-series-part-6-edge-intelligence-iot-minifi?ref=http://hortonworks.com/blog/edge-intelligence-iot-apache-minifi/ Are there additional errors in your MiNiFi log? It could be that one of the processors you used to create the flow in NiFi is not available in MiNiFi.

Online	Offline
Last Visited	‎09-10-2020 01:23 PM

Member Since	‎09-29-2015 04:02 PM
Last Visited	‎09-10-2020 01:23 PM
Posts	871
Kudos received	709

Cloudera Community

Re: Using nifi registry in a nifi cluster.

Re: Is there a way to enable a stateful status upd...

Re: Automated Start/Stop of a NiFi Processor

Re: PublishKafkaRecord_0_10 1.2.0.3.0.1.1-5 Error:...

Re: how to configure mergecontent processor

Re: MiNiFi RPG HTTP Proxy property is not automati...

Re: MiNiFi RPG HTTP Proxy property is not automati...

Re: ListenHTTP processor

Re: Integrating Apache NiFi and Apache Kafka

Re: Integrating Apache NiFi and Apache Kafka

Re: Solr 5.5 Solr Exception Error opening new sear...

Re: Error in NiFi Flow:

Re: MINIFI AND HDF 2.0 CONNECTING TOGETHER TO GET...

Re: MINIFI AND HDF 2.0 CONNECTING TOGETHER TO GET...

Re: MINIFI AND HDF 2.0 CONNECTING TOGETHER TO GET...