About MattWho

MattWho · ‎11-01-2016

@Anwaar Siddiqui With Nifi running as http (non-secure) there is no way to differentiate between users who access the UI. To NiFi everyone is the same anonymous user. Two or more people can still work on their own dataflows within a non-secured NiFi, but there is no way to prevent each user from modifying another users dataflow. Once NiFi is running as https (secured), some mechanism must be but in place to authenticate the users who will be accessing the canvas. Currently supported user authentication methods include TLS user certificates (default), LDAP, or kerberos. Through user authentication NiFi can now distinguish between users. After authentication comes authorization. Authorization is handled by NiFi (default) or Apache Ranger. This authorization layer is used to grant specific access policies to specific authenticated users. Things like controlled access to specific components (processors, process groups, controller services, etc.) are handled through this authorization. While every user still access the same canvas, this allows you to control what components can be seen and modified down to a specific user if desired. Thanks, Matt

MattWho · ‎11-01-2016

@bala krishnan When the MergeContent processor runs (based on run schedule configuration), it looks at the FlowFiles on the incoming queue and groups them in bins (think of these as logical containers) based on processor properties configurations. You have two Merge Strategies to choose from: 1. Defragmentation: This strategy requires that FlowFiles have the following attributes set on them. ------------------------------ *** Each "bundle" will go in to its own bin. ----------------------------- 2. Bin-Packing Algorithm (default): FlowFiles are simply placed in bins based on the criteria discussed below. Here are the properties that affect how the bins FlowFile are placed in are handled: -------------- *** FlowFile content is never truncated. For example: If a FlowFile where its content is larger then the configured "maximum Group Size" exists on incoming queue will simply become a merged file of one and passed directly to the merged relationship. -------------- *** No maximum simply means that there is no ceiling on the "maximum number of entries" or "maximum group size". Does this mean a bin will never get merged? NO. The currently queued FlowFiles on an incoming connection are placed in bin(s). If at the end of being placed in those bins, any of the bins has reached or exceeded the "minimum number of entries" or "minimum group size", that bin(s) is merged. -------------- *** Don't forget the "Max bin age" acts as your trump card. A bin will be merged if it has been around this long without being merged regardless of the other settings. -------------- Thanks, Matt

MattWho · ‎10-31-2016

ListenHTTP requires 2-way SSL when enabled. So the client will also need a keystore and truststore. The Truststore on both your client and server will need to contain the trusted cert entry for each others client cert. If you used the same CA for both then you should be good. If not you will need to add the CA or trusted key entry (Public key from each private key entry.) to each others Truststores.

MattWho · ‎10-31-2016

@Joshua Adeleke Since you are moving form two very different versions of HDF, I suggest following my procedure above. Stand-up your new HDF 2.0.1 install using a copy of your authorized-users.xml file from your old NiFi and the flow.xml.gz file from your old NiFi. Once up and running, access the UI and fix any invalid processors, controller services, and reporting tasks. There should not be many. ***NOTE: you can not copy the entire conf dir from that older version to the new as there are many changes to the files in that directory. (some do not exist in the new version and the new version has added some additional config files. What you can do is use the contents/configurations in many of the old files (named same) to configure like properties in the new NiFi's config files. Matt

MattWho · ‎10-28-2016

It does not look like you provided you key password.

MattWho · ‎10-27-2016

The SplitText will always route the incoming FlowFiles to the original relationship. What the SplitText processor is really doing is producing a bunch of new FlowFiles form a single "original FlowFile". Once all the splits have been created, the original un-split FlowFile is routed to the "Original" relationship. Most often that relationship is auto-terminated because users have need use for the original FlowFile after the splits are created.

MattWho · ‎10-27-2016

@Joshua Adeleke First lets make sure we are on the same page terminology wise.... NiFi FlowFiles --> FlowFiles are made up of two parts, FlowFile Content and FlowFile Attributes. FlowFiles Content is written to NiFI's content Repository while FlowFile Attributes mostly live in JVM heap memory and the NiFi FlowFile repository. It is the FlowFile attributes that move from processor to processor in your dataflow. Apache NiFi does not have a version 1.0.0.2.0.0.0-579. That is an HDF version of Apache NiFi 1.0.0. If you are migrating to HDF 2.0, I suggest instead migrating to HDF 2.0.1. It has many bug fixes you will want... https://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.1/index.html When you say you want to move the "dataflow process: to another server, are you talking about your entire canvas and all configured processors only? Or are you also talking about moving any active FlowFiles within the old flow to the new? My suggestion would be to stand-up your new NiFi instance/cluster. You can copy your flow.xml.gz file from your old NiFi to the new NiFi. ***NOTE: you need to make sure that the same sensitive props key is used (config setting for this is found in nifi.properties file) or your new NiFi will not start because it will not be able to decrypt any of your sensitive properties in that file. If your old NiFi was secured, you can use its authorized-users.xml file to establish the initial admin authorities in the new NiFi (configure this in authorizers.xml file). Once started you will need to access this new NiFi's UI and address any "invalid" processor/controller services, and add any controller services you may have had running on the NCM only in the old version. (No NCM in HDF 2.x versions). Some of these may be invalid because of changes to the processor/controller service properties. Once all has been addressed you are ready to move to the next step. Shutdown all ingest processor on your old NiFi and allow it to finish processing out any data it is already working on. At the same time you can start your new NiFi NiFi so it starts ingesting any new data and begins processing it. Thanks, Matt

MattWho · ‎10-26-2016

@Saikrishna Tarapareddy You can actually cut the ExtractText processor out of this flow. I forgot the RouteText processor generates a "RouteText.Group" FlowFile attribute. You can just use that attribute as the "Correlation Attribute Name" in the MergeContent processor.

MattWho · ‎10-26-2016

@Saikrishna Tarapareddy I agree that you may still need to split your very large incoming FlowFile into smaller FlowFiles to better manage heap memory usage, but you should be able to use the RouteText and ExtractText as follows to accomplish what you want: RouteText configured as follows: All Grouped lines will be routed to relationship "TagName" as a new FlowFile. They feed into an ExtractText configured as follows: This will extract the TagName as an attribute of on the FlowFile which you can then use as the correlationAttribute name in the MergeContent processor that follows. Thanks, Matt

MattWho · ‎10-26-2016

@Zack Riesland There are seven fields; however, the seventh field is optional. So you are correct. so both " 0 0 18 * * ? " and " 0 0 18 * * ? * " are valid. The below is from http://www.quartz-scheduler.org/documentation/quartz-2.x/tutorials/crontrigger.html ---------------- * (“all values”) - used to select all values within a field. For example, “” in the minute field means *“every minute”. ? (“no specific value”) - useful when you need to specify something in one of the two fields in which the character is allowed, but not the other. For example, if I want my trigger to fire on a particular day of the month (say, the 10th), but don’t care what day of the week that happens to be, I would put “10” in the day-of-month field, and “?” in the day-of-week field. See the examples below for clarification. ----------------- so only fields 4 and 6 will accept ?. Thanks, Matt

Member Since	‎07-30-2019 10:41 AM
Last Visited
Posts	3,133
Kudos received	1560

Cloudera Community

Re: Flowfile stuck in Wait in EnforceOrder process...

Re: Untrusted proxy error Authentication Failed o....

Re: REST API Configuration for NiFi 2.0

Re: Fileflow penalized for certain time before all...

Re: Nifi : Implement Sleep Mechanism in nifi witho...

Re: Nifi cluster setup

Re: Apache Nifi - Merge Content

Re: [RESOLVED] : NIFI : LISTENHTTP SSL

Re: Migrating NiFi flow files between servers

Re: [RESOLVED] : NIFI : LISTENHTTP SSL

Re: What is a good approach for Spilitting 100GB f...

Re: Migrating NiFi flow files between servers

Re: What is a good approach for Spilitting 100GB f...

Re: What is a good approach for Spilitting 100GB f...

Re: Helping setting up cron-based nifi processor