Member since
09-29-2015
871
Posts
723
Kudos Received
255
Solutions
01-11-2018
05:42 PM
Hi Samir, There shouldn't be an end-less loop... As you can see in the HDFS example diagram, there are two parts to the flow: - ListHDFS -> RPG (this part only runs on primary node) - Input Port -> FetchHDFS -> rest of the flow (this part runs on all nodes) The starting point of your flow should be something that has no input, like ListHDFS, so there can't be a circular loop back to that point. The end of the second part should end with wherever you are sending your data, like PutHDFS for example, after that it is dead end, no loop back to anywhere. If this is not clear, please provide a screen shot or template of your flow so we can see how you have connected the processors. Thanks, Bryan
... View more
11-06-2017
05:35 PM
No problem, glad it was helpful 🙂 If you are reaching the login screen, then it means your browser is not forwarding your credentials to NiFi. You could try setting the negotiate properties to just "myhost.de" instead of the full url with port. Another thing to look at might be the domain being used by your KDC... In this example I was using nifi.apache.org as the domain, so I had to add a mapping in /etc/hosts to map nifi.apache.org to localhost so I could use nifi.apache.org in my browser to access my local NiFi. If you are accessing myhost.de to get to your NiFi instance, but that isn't the domain in your KDC, then it won't line up and probably won't forward your credentials.
... View more
10-30-2017
01:53 PM
Nice article! You could also use the "Message Demarcator" property in PublishKafka (set to a new-line) and this way you never have to split up your flow file, it will stream the large flow file and read based on the demarcator so you still get each line sent as an individual message to Kafka.
... View more
10-30-2017
01:48 PM
Hello, this post is for ListenUDP, ListenTCP, ListenSyslog, and ListenRELP. The ListenWebSocket processor is implemented differently and does not necessarily follow what is described here. I'm not familiar with the websocket processor, but maybe others have experience with tuning it.
... View more
09-08-2017
08:12 PM
Something is not set up correctly because the log is showing alvin@NIFI.COM in some places and alvin@NIFI.COM in other places. What are you entering as the username when you login? What is entered for the Initial Admin in authorizers.xml?
... View more
09-08-2017
06:41 PM
1 Kudo
If you are setting up authentication for users accessing NiFi's UI, then you only need the spnego properties as shown in this post. If you need NiFi to authenticate to other services, for example to talk Ranger when Ranger is kerberized, then you need the service principal and keytab.
... View more
08-21-2017
04:06 PM
1 Kudo
Hi @sukesh nagaraja I think the results you got are expected behavior behavior... The extracting request handler has no way to know the field names for the data you sent in. It is generally used to extract text from files like PDFs, or Word documents, where you basically have a title and content, and everything just goes into the content mostly. For your scenario, you basically have a CSV where you know the field names. Take a look at Solr's CSV update handler: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates You can use this from NiFi by setting the path to /update and setting the Content-Type to application/csv and then add a property fieldnames with your list of fields. I'd recommend playing around with the update handler outside of NiFi first, just by using curl or a browser tool like Postman, and then once you have the request working the way you want, then get it working in NiFi.
... View more
02-10-2017
08:24 PM
1 Kudo
@Raj B Thanks! Since the NCM didn't do any data processing there is actually not much different for this article between 0.x and 1.x. The only real difference is when setting up site-to-site connections you now create a remote process group and use the URL of any node in the cluster, where as before you entered the URL of the NCM. Other than that just pretend the NCM isn't in the diagrams and you should be good to go.
... View more
12-02-2016
01:57 PM
@Avijeet Dash There is really no correct answer for the architecture because it depends on the specs of the servers and on the amount of data moving through the system and what is being done to each piece of data. I can say that for a high volume production scenario, you would probably not want to co-locate the services on each node. The biggest impact on performance will likely be making sure each NiFi repository (flow file, content, prov) has its own disks, and Kafka has its own disks, to avoid I/O contention. NiFi currently doesn't do data replication, but there it is being worked on by the community: https://cwiki.apache.org/confluence/display/NIFI/Data+Replication Even without data replication, typically you would have a RAID configuration for your repository disks on each of your NiFi nodes so it would have to be some kind of error that went beyond just a single disk failure. As long as you can get the node back up then all the data will be there.
... View more
12-01-2016
03:46 PM
@Avijeet Dash Thanks! It really depends on what you are trying to do. NiFi and Kafka are serving two different purposes... NiFi is a data flow tool meant to move data between systems and provide centralized management of the data flow. All of the data in NiFi is persisted to disk and it will survive restarts/crashes. Kafka provides a durable stream store with a decentralized publish and subscribe model, where consumers can manage their own offsets and reset them to replay data. NiFi is not trying to hold on to the data, it is trying to bring it somewhere and once it is delivered to the destination, then it is no longer in NiFi. Where as Kafka typically holds on to the data longer which is what allows for consumers to reset their offsets and replay data. Also if you have many downstream consumers that are all consuming the same data, it makes more sense for consumers to latch on to a Kafka topic. NiFi does offer the ability for consumer to pull data via site-to-site, but you would need to setup an Output Port in NiFi for each of these consumers to pull from. So some examples... If you are trying to ingest data to HDFS, then NiFi can do that by itself. If you are trying to provide data to 10s or 100s of streaming analytics, then putting the data in Kafka makes sense, you may still want/need NiFi to get your data into Kafka. If you have data sources that you want to get into Kafka, but they can't be changed to communicate with Kafka, then use NiFi to reach out and get the data from those systems and publish it to Kafka.
... View more