Member since
04-29-2016
192
Posts
20
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1674 | 07-14-2017 05:01 PM | |
2839 | 06-28-2017 05:20 PM |
04-28-2017
08:16 PM
In case of a secure NiFi instance, you can check for heartbeat with this curl command below curl 'https://<nifi-server>:8077/nifi-api/access/token' -H 'Content-Type: application/x-www-form-urlencoded; charset=UTF-8' --data 'username=<username>&password=<password>' --compressed --insecure if you get a token back, the NiFi instance is up and running. Here is the link where this is posted/answered - https://community.hortonworks.com/questions/96713/nifi-api-unable-to-validate-the-access-token-error.html
... View more
04-28-2017
07:54 PM
Hi All, One of our NiFi dataflows ingests small files (1-5 KB each) at a rate of 100+ messages per second. The requirement is to store them in HDFS, we're using a MergeContent processor to bundle 1000 files into a new bigger file, which makes the files a bit bigger for HDFS, but not near the ideal size for HDFS storage. We could make the MergeContent wait for more files until the desired-sized merged file is ready, but the problem is, we do not want to wait too long within NiFi, we want to be able to send data to HDFS as close to "near real-time" as possible, and not wait a day in MergeContent processor, waiting for enough files to accumulate. So, it appears PutHDFS "append" might work, where you write the files as they come in and append them to existing HDFS file until the desired HDFS file size is accomplished (have some questions on this approach, posted a question on that here - https://community.hortonworks.com/questions/99843/questions-on-nifi-puthdfs-append-option.html) Another option, we're considering is, have a nightly job/dataflow that merges HDFS files at rest to desired size; this seems like a simpler approach. Wanted to know which option would be better to address the too-many-small-files issue. Thanks in advance.
... View more
Labels:
- Labels:
-
Apache NiFi
04-28-2017
07:40 PM
Hello, I have couple of questions on how NiFi's PutHDFS "append" option works. But let me start with a background before I ask the questions: Below is a schematic of how I'm envisioning PutHDFS append might be used, in a scenario where you have small files that are ingested and the requirement is to write them to HDFS as soon as the data comes in (for "near" real-time analysis of that data); and as we know that creates too many small files in HDFS, and the overhead and performance issues that result from that; So, one option, I think, is to write the small files to HDFS as they arrive, but keep appending to an existing HDFS file until you've grown that file to a decent size for HDFS; then start appending to a new HDFS file. With this approach, my understanding is that we need to keep checking the HDFS file size, to decide if it's now time to start writing/appending to a new HDFS file. Please see the schematic below. 1) In this approach, does the constant checking of HDFS file size create enough of an overhead to affect the performance of the dataflow or it's not that big a factor ? 2) Once we reached the desired size on the HDFS file, how does the file "close" happen ? does the PutHDFS processor (that has "append" selected for "Conflict Resolution Strategy"), after some idle time go back and close the file ? or does PutHDFS close the file after each append; if the latter (opening and closing for each append), then does that create overhead and degrade performance ? If I have it all wrong about how to use the PutHDFS append option, please let me know that as well. Thanks for your time.
... View more
Labels:
- Labels:
-
Apache NiFi
04-24-2017
01:40 PM
thanks @amankumbare could you please add how you would access the API in a secure NiFi environment - getting a token first, then use the token to access the provenance events.
... View more
04-21-2017
02:39 PM
@Ravi Teja did you try this ? it worked for me ${CR_LINE_CMTD_START_DT:toDate("yyyyMMddHHmmss"):format("yyyy-MM-dd HH:mm:ss")}
... View more
04-21-2017
01:16 PM
thank you @Bryan Bende, for the explanation and for clarifying.
... View more
04-20-2017
05:59 PM
Hello, I have couple of questions on data recovery/integrity after a standlaone NiFi instance crashes (and is subsequently restarted); I searched here in HCC about NiFi fault-tolerance, the few posts that I looked at talk about fault-tolerance and recovery in a cluster environment, but I'm interested in a standalone NiFi scenario. In our case the standalone NiFi instance crashed because the Java heap ran out of space (but we do have enough heap space allocated); at the time of the crash one dataflow was running a real-time stream without any issues, but when a second dataflow, that had ListSFTP and FetchSFTP processors, started running (there is a known issue with running List and Fetch processors when there are lots of files to process - https://issues.apache.org/jira/browse/NIFI-3423), some of the NiFi processors in the first dataflow started to throw out-of-memory errors and the ListenTCP processor stopped ingesting new flowfiles from the source system; and, on the server the Java process CPU utilization was something like 250%; at that point, we stopped both dataflows (the NiFi canvas was still accessible, it did not crash) and restarted the NiFi instance; after that, we resumed the first dataflow and all was fine; Since NiFi writes flowfile content to content repository and keeps attributes and state info in the flowfile repository, no data should have been lost or corrupted when NiFi instance crashed. Just wanted to clarify if my understanding in this scenario (no loss of data or integrity issues) is correct. 2nd question is, during a planned server reboot, if we stop a dataflow when there is data in transit in the dataflow (i.e. some flowfiles are in queues between processors, some flowfiles are being processed by NiFi processors, e.g. replacetext), then restart the NiFi instance and resume the stopped dataflow, would NiFi pickup where it was left off? i.e, flowfiles that were in the middle of being processed (just prior to the dataflow being stopped) resume their processing from where they were left off? my understanding is yes, that's how NiFi works, the fault tolerance built into NiFi takes care of that. Would like to know if that is correct ? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache NiFi
04-19-2017
07:06 PM
@wynner, thanks for confirming; and special thanks for following up with me on this question over the last few days and finally leading me to a solution that works.
... View more
04-19-2017
06:04 PM
@Wynner You're right, I didn't think about that. But would receiving the API token by itself constitute that NiFi is up and running and fully functional ? I'm thinking yes (I'm not sure if a scenario is possible where you would get a token from an API call, but yet NiFi is not fully functional); I want to confirm that since we'll eventually be using that mechanism in our PROD environment.
... View more
04-19-2017
01:44 PM
That's correct. The load balancer can do heartbeat checks to see which node is alive or not and send messages to only the active nodes. It is for this heartbeat check that I wanted to make a rest API call to the NiFi nodes to see if the node is available or not. Without LDAP/SSL it worked just fine, load balancer was able to heartbeat checks on the nodes, no issues in making the API call to NiFi node; it is only after adding LDAP/SSL that I'm having issues with the curl command to work.
... View more