About MattWho

MattWho · ‎09-21-2016

There is the possibility that the time could differ slightly (ms) between when both now() functions are called in that expression language which could cause the result to push pack to 11:58:59. To avoid this you can simply reduce 43260000 by a few milliseconds (43259990) to ensure that does not happen so 11:59:00 is always returned.

MattWho · ‎09-21-2016

@Sree Venkata You can do this using a combination of NiFi Expression Language (EL) functions: ${now() :minus(${now():mod(86400000)}) :minus(43260000) :format('MM-dd-yyyy hh:mm:ss') } This EL statement takes now subtracts the remainder resulting from dividing now by 86400000 (number of milliseconds in 24 hours) and then subtracts an additional 43260000 (12 hours and 1 minute) from that result and finally formatting the output in the date format you are looking for. I confirmed this EL statement by using it in an UpdateAttribute processor: and if I look at the attributes on a FlowFile that was processed by the above, I see: You can see that the attribute "yesterday" is set to exactly one day earlier from "current time" and 11:59:00. Thanks, Matt

MattWho · ‎09-21-2016

@mayki wogno Nodes in NiFi cluster do not share data. Each works on the very specific data it has received through some ingest type NiFi processor. As such, each node has its own repositories for storing that specific data (FlowFile content) and the metadata (FlowFile attributes) about that node specific data. As a cluster every node loads and runs the exact same dataflow. One and only one node in the cluster can be the "primary node" at any given time. Some NiFi processors are not cluster friendly and as such should only run on one node in the cluster at any given time. (GetSFTP is a good example) NiFi allows you to configure those processor with a "on primary node" only scheduling strategy. While these processors will still exist on every node in the cluster they will will only run on the primary node. If the primary node designation in the cluster should change at any time, the cluster takes care of stopping the "on primary node" scheduled processors on the original primary node and staring them on the new primary node. When a node goes down, the other nodes in the cluster will not pickup working on the data that was queued on that down node at this time. That Node as Bryan pointed out will pick up where it left off on its queued data once restored to an operational state provide there was no loss/corruption to either the content or FlowFile repositories on that specific node. Thanks, Matt

MattWho · ‎09-21-2016

If the node that goes down happens to be the "primary node", the cluster will automatically elect a new "primary node" from the remaining available nodes and start those "primary node only" processors.

MattWho · ‎09-21-2016

@Gerd Koenig After a closer look at the jaas file you posted above, I believe you issue is because of a missing " in the following line: principal="nifi@REALM; This line should actually be: principal="nifi@REALM"; Try making the above change and restarting your NiFi. Thanks, Matt

MattWho · ‎09-20-2016

@Gerd Koenig Can you try changing the value you have for "Message Delimiter" from "\n" to an actual new line in your PutKafka processor? You can add a new line by holding the Shift key while hitting enter. The result will appear as below: Thanks, Matt

MattWho · ‎09-20-2016

@Gerd Koenig The question here is are you running Apache NiFi 0.6 or HDF 1.2? I believe you are using Apache NiFi 0.6 which does not understand PLAINTEXTSASL as the security protocol. The Kafka 0.8 in HDP 2.3.2 and the Kafka 0.9 in HDP 2.3.4 use a custom Hortonworks Kafka client library. Kafka 0.8 in HDP 2.3.2 introduced support for kerberos before it was supported in the community. That support introduced the PLAINTEXTSASL security protocol. later when Apache Kafka 0.9 added kerberos support they used a different security protocol (SASL_PLAINTEXT). In order for HDF 1.2 to work with HDP 2.3.2, the GetKafka processor was modified from the Apache GetKafka to use that modified client library. Hortonworks again modified the client lib in HDP 2.3.4 for Kafka 0.9 so that it was backwards compatible and still supported the PLAINTEXTSASL security protocol. So bottom line here is that HDF 1.2 NiFi can talk kerberos to both HDP 2.3.2 (Kafka 0.8) and HDP 2.3.4 (Kafka 0.9), but Apache NiFi cannot. The NiFi consume and publish Kafka processor available in NiFi 0.7, NiFi 1.0, and HDF 2.0 do not use the Hortonworks custom Kafka client lib and can be used with Kafka 0.9 but not Kafka 0.8. You will need to use the SASL_PLAINTEXT security protocol with these new processors. Thanks, Matt

MattWho · ‎09-16-2016

@David Morris The nifi expression language can be used to route your data based on file extensions as you have described. When NiFi ingested data a NiFi FlowFile is created. That FlowFile is a combination of the original content and Metadata about that content. Upon ingest some metadata is created for every FlowFile. One of those attributes is named "filename" and contains the original filename of the ingested file. The RouteOnAttribute can use the NiFi Expression Language to evaluate the Flowfile's "filename" attribute fro routing purposes: In the RouteOnAttribute processor you would need to add new properties fro each file extension type you want to look for: Each one of those newly added properties become new relationships for that processor that can then be routed to follow-on processors as seen in the example above. Thanks, Matt

MattWho · ‎09-15-2016

@Saikrishna Tarapareddy The purpose of using a RAID is to protect against the loss of a disk. If the intent here is to protect against a complete catastrophic loss of the system, there are somethings you can do. Keeping a backup of the conf directory will allow you to quickly restore the sate of your NiFi's dataflow. Restoring the state of your dataflow does not restore any data that may have been active in the system at the time of failure. The NiFi repos contain the following information: Database repository --> Contains change history to the graph (Keep record of all changes made on the canvas). If NiFi is secured, this repo also contains the users db. Loss if either of these has little impact. Loss of configuration history will not impact your dataflow or data. The users db is rebuilt from the authorized-users.xml file (located in conf dir by default) upon NiFi start. Provenance repository(s) --> Contains NiFi FlowFile lineage history. Loss of this repo will not affect your dataflow or data. You will simply be unable to perform queries against data that traversed the system previous to the loss. FlowFile repository --> Loss of this Repos will result in loss of data. The FlowFile repo keeps all attributes about Content currently in the dataflow. This includes where to find the actual content in the content repository(s). The information in this repo changes rapidly so backing up this repo is not really feasible. Raid offers your best protection here. Content repository(s) --> Loss of this repo will also result in loss of data and archived data (If configured to archive). The content repository(s) contain the actual content of the data NiFi processes. The data in this repo also changes rapidly as files are processed through the NiFi dataflow(s), so backing up this repo(s) is also not feasible. Raid offers your best protection here as well. As you can see recover from disk failure is possible with RAID; however, a catastrophic loss of the entire system will result in loss of the data that was currently in mid processing by any of the dataflows. Your Repos could be external attached storage. (There is likely to be some performance impact because of this; however, in the event of catastrophic server loss a new server could be stood-up using the backed-up conf dir and attached to the same external storage. This would help prevent data loss and allow processing to pickup where it left off. Thanks, Matt

MattWho · ‎09-14-2016

If later you decide to add new disks you can simply cop[y your content repositories to those new disks and update the nifi.properties file repo config lines to point at the new locations.

Member Since	‎07-30-2019 10:41 AM
Last Visited
Posts	3,131
Kudos received	1560

Cloudera Community

Re: Flowfile stuck in Wait in EnforceOrder process...

Re: Untrusted proxy error Authentication Failed o....

Re: REST API Configuration for NiFi 2.0

Re: Fileflow penalized for certain time before all...

Re: Nifi : Implement Sleep Mechanism in nifi witho...

Re: nifi - how to derive yesterday 's date like 09...

Re: nifi - how to derive yesterday 's date like 09...

Re: HDF 2.0 Cluster. How manager dataflow and fail...

Re: HDF 2.0 Cluster. How manager dataflow and fail...

Re: standalone NiFi + putKafka into kerberized Kaf...

Re: standalone NiFi + putKafka into kerberized Kaf...

Re: standalone NiFi + putKafka into kerberized Kaf...

Re: NiFi Filtering for Kafka Pipeliine Purposes

Re: NiFI Server Configuration

Re: NiFI Server Configuration