About MattWho

MattWho · ‎12-21-2016

@Sunile Manjee Also keep in mind that NiFi Content archiving is enabled by default with a retention period of 12 hours or 50% disk utilization before the archived content is removed/purged. The purging of FlowFile manually within your dataflow will not trigger the deltion of archived FlowFiles.

MattWho · ‎12-20-2016

@Ahmad Debbas FlolwFiles generated from the GetHDFS processor should have a "path" attribute set on them: The path is set to the relative path of the file's directory on HDFS. For example, if the Directory property is set to /tmp, then files picked up from /tmp will have the path attribute set to "./". If the Recurse Subdirectories property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to "abc/1/2/3". Since it is only the relative path and not an absolute path, you would need to use an UpdateAttribute processor to prepend the configured directory path the that relative path if you need the absolute path for use later in your flow. Thanks, Matt

MattWho · ‎12-20-2016

@D'Andre McDonald The Get based processors will create a "absolute.path" FlowFile attribute on all Files that are ingested in to NiFi. So you would configure your Get processor to point at the base directory and consume files from all subdirectories. The Put based processors support expression language in the "remote path" property. So you can use any attribute on the FlowFile to specify what path the file will be written to on the put. So here you could use ${absolute.path} as the value for this property. The Put based processors also have a property for "create directory" which you can set to true. Thank you, Matt

MattWho · ‎12-15-2016

@NAVEEN KUMAR One suggestion might be use a ListFile processor configured to run on cron schedule. You could then feed the success from that processor to MonitorActivity processor. The inactive relationship of this processor could be routed to a putEmail processor. So lets say you have you list file configured to run every 3 minutes based on a cron. You could set your threshold in the MonitorActivity processor to 3 minutes with a setting of "continually send message" set to true. With the inactive relationship routed to putEmail, you will get an email every 3 minutes if the listFile produced no new files. you could also route the activity.restored relationship to a PutEmail processor if you want to be notified if file where seen following a period of no activity. Thanks, Matt

MattWho · ‎12-14-2016

@Sunile Manjee FlowFile Content is stored in claims inside the content repo. Each claim can contain the content from 1 or more FlowFiles. A claim will not be moved to content Archive or purged from the content repository until all active FlowFiles in your dataflow that have references to any of the content in that claim have been removed. Those FlowFiles can be removed via manual purging of the queues (Empty Queue), Flow file expiration on a connection or via auto-termination at the end of a dataflow. The FlowFile count and size reported in the UI does not reflect the size of the claims the content repo. Those stats report the size and number of active FlowFiles queued in your flow. It is very likely and usual to see the size reported in the UI to differ from actual disk usage. Thanks, Matt

MattWho · ‎12-12-2016

@Piyush Routray Not sure I am clear with what you mean by "I intend to have a separate NiFi cluster than the HDF cluster". Are you installing just NiFi via command line? - You can install NiFi using command line and utilize the embedded zk. http://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.1/index.html When you get to the download HDF section of the "Command Line Installation" documentation, go to the bottom of the list to download just the NiFi tar.gz file. The relevant docs for this are found here: http://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.1/bk_administration/content/clustering.html http://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.1/bk_administration/content/state_providers.html Are you trying to install NIFi via HDF's Ambari? - The Ambari based installation of HDF will install an external ZK for you and setup NiFi to use it for you. Thanks, Matt

MattWho · ‎12-09-2016

@pholien feng before a user can access the UI, that user must have the "view the interface" policy granted for them. This policy is added through the global policies UI found under the hamburger menu located in the upper right corner. I see that step is missing in the above answer. Sorry about that. Matt

MattWho · ‎12-08-2016

@Michael Young HDF NiFi at its core is designed to be very lightweight; however, how powerful a host/node that HDF NiFi needs to be deployed on really depends on the complexity of implemented dataflow and the throughput and data volumes that dataflow will be handling. HDF NiFi may be deployed at the edge, but usually along with those Edge deployments comes a centralized cluster deployment that runs a much more complex dataflow handling data coming from the edge NiFis as well as many other application sources. Thanks, Matt

MattWho · ‎12-08-2016

@Avijeet Dash Every Node in a NiFi cluster run with their own repositories, flow.xml.gz, and work with their own set of data. Nodes in a cluster are unaware of what data other nodes in the cluster are working on. Once a cluster coordinator is elected all nodes send heartbeats to that node. Nodes cannot share repositories. When you access the UI via any Node in the cluster, the UI will show the cumulative stats of the entire cluster to the user. The centralized management aspect comes in to play here. Any changes you make within NiFi (no matter which node UI you are logged into) will be replicated to all nodes in the cluster. Thanks, Matt

MattWho · ‎12-08-2016

@pholien feng I need more detail on what you are seeing. There are two parts to accessing a secured NiFi installation, Authentication and authorization. Authentication by default expects users to authenticate using SSL. A user would need to present a valid certificate via their browser to NiFi for authentication. NiFi can also be configured via the login-identity.providers.xml file to support either LDAP or Kerberos for users authentication. After a user successfully authenticates, the authorization piece occurs. The above answer deals with the authorization piece only. Check you nifi-user.log to see if authentication is successful. make sure the DN shown in the nifi-users.log matches exactly (case sensitive and whitespace issues?) what is configured in the "Initial Admin Identity" property in your authorizers.xml file. When nifi is started for the first time after enabling https the users.xml and authorizations.xml files are generated based on the user supplied configurations in the authorizers.xml file. Should the configurations in the authorizers.xml get edited at a later time, those changes will not be made to the existing users.xml or authorizations.xml files. They are only ever created once, subsequent edits to these files are expected to be done via the NiFi application. If you made a mistake in these files when setting up https access for the first time, you can remove these two files and they we be re-created next time you start NiFi. Thanks, Matt

Member Since	‎07-30-2019 10:41 AM
Last Visited
Posts	3,133
Kudos received	1561

Cloudera Community

Re: Flowfile stuck in Wait in EnforceOrder process...

Re: Untrusted proxy error Authentication Failed o....

Re: REST API Configuration for NiFi 2.0

Re: Fileflow penalized for certain time before all...

Re: Nifi : Implement Sleep Mechanism in nifi witho...

Re: Is content purged when flow files are deleted?

Re: GetHDFS Path Field

Re: How can we use a single set of Put/Get Process...

Re: How to search a directory for files and send a...

Re: Is content purged when flow files are deleted?

Re: Do I need to install and start zookeeper separ...

Re: How to enable User Authentication with Kerbero...

Re: Combine HDP and HDF

Re: nifi clusters

Re: How to enable User Authentication with Kerbero...