About bbende

bbende · ‎03-22-2016

I just tried this and realized I mis-spoke about setting "nifi.remote.input.secure" to false on the https instance and not needing the certs on the http instance. The reason why is because the http instance still needs to connect to the https instance initially to ask for the value of "nifi.remote.input.port". So even though the resulting site-to-site connection would be unsecure, the initial connection still has to be secure. Here is my best attempt to recreate the steps I just followed that ended up working... I created two copies of nifi-0.6.0-SNAPSHOT in a directory and called one nifi-https and one nifi-http. On nifi-https I configured the following properties in nifi.properties (everything else left as defaults): nifi.remote.input.socket.host=hostname.from.my.cert nifi.remote.input.socket.port=8899 nifi.remote.input.secure=true nifi.web.http.port= nifi.web.https.port=8443 nifi.security.keystore=mycert.p12 nifi.security.keystoreType=PKCS12 nifi.security.keystorePasswd=mypassword nifi.security.truststore=mytruststore.jks nifi.security.truststoreType=JKS nifi.security.truststorePasswd=mypassword That makes nifi-https a secured instance with the web UI running on 8443 and a secure site-to-site connect available on 8899. Now I went to https://localhost:8443/nifi in my browser and got prompted to request an account. At this point I edited nifi.properties again to turn on regular http access by setting nifi.web.http.port=8080, restarted nifi, went to http://localhost:8080/nifi and approved my account, then removed http access and restarted again, and was able to access over https. I then created a an Output Port called "Test" with a GenerateFlowFile sending data to it. At this point nifi-https is fully setup. On nifi-http I configured the following properties in nifi.properties (everything else left as defaults): nifi.security.keystore=mycert.p12 nifi.security.keystoreType=PKCS12 nifi.security.keystorePasswd=mypassword nifi.security.truststore=mytruststore.jks nifi.security.truststoreType=JKS nifi.security.truststorePasswd=mypassword That makes nifi-http a regular unsecured instance running on port 8080, but it now has the cert and truststore to make outbound secure connections. Now I went to http://localhost:8080/nifi and create a Remote Process Group (RPG) with a URL of https://hostname.from.my.cert:8443/nifi. It is import that the hostname in this URL matches the value of "nifi.remote.input.socket.host" from the nifi-https instance. Now I right-clicked on the RPG and chose Enable Transmission at which point I got a message that an account was requested. This happened because nifi-http is using mycert.p12 to connect to nifi-https, but nifi-https does not have an approved account for mycert.p12. So I went to nifi-https (https://localhost:8443/nifi) and went to the accounts section and approved the account for mycert.p12 and chose a role of "NiFi'. We also need to give the mycert.p12 user access to the "Test" Output Port. So on the https instance I stopped "Test", right-clicked and Configure, and from the Access Controls tab started typing the DN from mycert.p12, added that user to the Allowed Users list, hit Apply and started the port again. Then I went back to nifi-http and right-clicked on the RPG and chose Refresh which caused it to retrieve the available Output Ports from nifi-https. I then connected the "Test" Output Port from the RPG to LogAttribute, started everything and it was able to pull FlowFiles from nifi-https.

bbende · ‎03-19-2016

For this question you have to first take NiFi out of the picture and think about how you would index HTML with Solr. HTML is not typically one of the standard input formats like JSON, XML, and CSV, but Solr has an "extracting request handler" which is capable of handling HTML, see this page: https://wiki.apache.org/solr/ExtractingRequestHandler To use that from NiFi you need to set the "Content Stream Path" to "/update/extract", set the "Content Type" to "text/html", and add a user defined property for "literal.id" and set it to some id (you can use the FlowFile uuid by setting it to ${uuid}).

bbende · ‎03-18-2016

Hello, There a couple of factors at play here... Site-to-Site uses the the same SSL configuration that is also used to configure the SSL for the UI. This is provided through nifi.properties: nifi.security.keystore= nifi.security.keystoreType= nifi.security.keystorePasswd= nifi.security.keyPasswd= nifi.security.truststore= nifi.security.truststoreType= nifi.security.truststorePasswd= So you should be able to have an http instance, meaning the UI is not configured with a secure https port, but you still configure the keystore/truststore properties above, and it will use those to connect to the secure NiFi instance. Secondly, on your https instance, if you set "nifi.remote.input.secure" to false then you should also be able to make a connection from from your http to https instance without configuring the above properties, but the connection will be unsecured in this case.

bbende · ‎02-19-2016

The documentation for InvokeHttp says that Dynamic properties are sent as headers: Name Value Description Header Name Attribute Expression Language Send request header with a key matching the Dynamic Property Key and a value created by evaluating the Attribute Expression Language set in the value of the Dynamic Property. Supports Expression Language: true So you should be able to add a property Content-Type with a value of application/json. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.InvokeHTTP/index.html

bbende · ‎02-10-2016

In an Apache NiFi cluster, every node runs the same dataflow and data is divided between the nodes. In order to leverage the full processing power of the cluster, the next logical question is - "how do I distribute data across the cluster?". The answer depends on the source of data. Generally, there are sources that can push data, and sources that provide data to be pulled. This post will describe some of the common patterns for dealing with these scenarios. Background A NiFi cluster is made up of a NiFi Cluster Manager (NCM) and one or more nodes. The NCM does not perform any processing of data, but manages the cluster and provides the single point of access to the UI for the cluster. In addition, one of the processing nodes can be designated as a Primary Node. Processors can then be schedule to run on the Primary Node only, via an option on the scheduling tab of the processor which is only available in a cluster. When connecting two NiFi instances, the connection is made with a Remote Process Group (RPG) which connects to an Input Port, or Output Port on the other instance. In the diagrams below, NCM will refer to the cluster manager, nodes refer to the nodes processing data, and RPG refers to Remote Process Groups. Pushing When a data source can push it's data to NiFi, there will generally be a processor listening for incoming data. In a cluster, this processor would be running on each node. In order to get the data distributed across all of the listeners, a load balancer can be placed in front of the cluster, as shown in the following example: The data sources can make their requests against the url of the load balancer, which redirects them to the nodes of the cluster. Other processors that could be used with this pattern are HandleHttpRequest, ListenSyslog, andListenUDP. Pulling If the data source can ensure that each pull operation will pull a unique piece of data, then each node in the NiFi cluster can pull independently. An example of this would be a NiFi cluster with each node running a GetKafka processor: Since each GetKafka processor can be treated as a single client through the Client Name and Group ID properties, each GetKafka processor will pull different data. A different pulling scenario involves performing a listing operation on the primary node and distributing the results across the cluster via site-to-site to pull the data in parallel. This typically involves "List" and "Fetch" processor where the List processor produces instructions, or tasks, for the Fetch processor to act on. An example of this scenario is shown in the following diagram with ListHDFS and FetchHDFS: ListHDFS is scheduled to run on primary node and performs a directory listing finding new files since the last time it executed. The results of the listing are then sent out ListHDFS as FlowFiles, where each FlowFile contains one file name to pull from the HDFS. These FlowFiles are then sent to a Remote Process Group connected to an Input Port with in the same cluster. This causes each node in the cluster to receive a portion of the files to fetch. Each FetchFile processor can then fetch the files from HDFS in parallel. Site-To-Site If the source of data is another NiFi instance (cluster or standalone), then Site-To-Site can be used to transfer the data. Site-To-Site supports a push or pull mechanism, and takes care of evenly pushing to, or pulling from, a cluster. In the push scenario, the destination NiFi has one or more Input Ports waiting to receive data. The source NiFi brings data to a Remote Process Group connected to an Input Port on the destination NiFi. In the pull scenario, the destination NiFi has a Remote Process Group connected to an Output Port on the source NiFi. The source NiFi brings data the Output Port, and the data is automatically pulled by the destination NiFi. NOTE: These site-to-site examples showed a standalone NiFi communicating with a cluster, but it could be cluster to cluster.

bbende · ‎02-10-2016

Typically when referring to site-to-site we are referring to a Remote Process Group on one NiFi instance, communicating with an Input Port or Output Port on another NiFi instance. This is a TCP based protocol which is internal to NiFi, and makes a direct connection between the two instances. It can optionally be secured with TLS/SSL. The side of the connection that is receiving data, or providing data to be pulled, must configure the following properties in nifi.properties: # Site to Site properties nifi.remote.input.socket.host= nifi.remote.input.socket.port= nifi.remote.input.secure=true

bbende · ‎02-01-2016

I think the schema needs to be a valid URI which would require the file protocol like this: file:///C:/Avro/schema1.avsc Additionally, the schema field also allows a schema to be pasted directly into the value of the field if you want to avoid pointing at a file.

bbende · ‎12-21-2015

This turns out to be specific to using "unsigned int" which is essentially a Long, but we are generating an Avro schema that expects an "int". Some changes in 0.4.0 that fixed other issues with ExecuteSQL appear to have introduced this. I captured the issue with this JIRA: https://issues.apache.org/jira/browse/NIFI-1319

bbende · ‎12-21-2015

Ok thanks, with your DDL I can recreate. Looking into it now...

bbende · ‎12-21-2015

Hi @Jobin George , I was trying to recreate this error... I have NiFi 0.4.0, MySQL 5.6.26, mysql-connector-java-5.1.38-bin.jar. Created the same table as you and inserted three rows: CREATE TABLE SALARIES ( ID int NOT NULL AUTO_INCREMENT, ZIPCODE int, SALARY double, AGE int, GENDER varchar(255), PRIMARY KEY (ID) ); INSERT INTO SALARIES (ZIPCODE, SALARY, AGE, GENDER) VALUES (12345, 100, 30, 'MALE'); INSERT INTO SALARIES (ZIPCODE, SALARY, AGE, GENDER) VALUES (12345, 200, 31, 'MALE'); INSERT INTO SALARIES (SALARY, AGE, GENDER) VALUES (10, 20, 'MALE'); In NiFi I have ExecuteSQL with "select * from salaries;" -> ConvertAvroToJson -> LogAttribute and I see this in the logs: [{"ID": 1, "ZIPCODE": 12345, "SALARY": 100.0, "AGE": 30, "GENDER": "MALE"},{"ID": 2, "ZIPCODE": 12345, "SALARY": 200.0, "AGE": 31, "GENDER": "MALE"},{"ID": 3, "ZIPCODE": null, "SALARY": 10.0, "AGE": 20, "GENDER": "MALE"}] If I change the query to "select id from salaries;" I see: [{"ID": 1},{"ID": 2},{"ID": 3}] Is there anything that jumps out at you as being different between your setup and mine? different versions of mysql? something specific in your data?

Online	Offline
Last Visited	‎09-10-2020 01:23 PM

Member Since	‎09-29-2015 04:02 PM
Last Visited	‎09-10-2020 01:23 PM
Posts	871
Kudos received	709

Cloudera Community

Re: Using nifi registry in a nifi cluster.

Re: Is there a way to enable a stateful status upd...

Re: Automated Start/Stop of a NiFi Processor

Re: PublishKafkaRecord_0_10 1.2.0.3.0.1.1-5 Error:...

Re: how to configure mergecontent processor

Re: Hortonworks HDF( Nifi ) : Site to Site ( https...

Re: Issue indexing html files using nifi and PutSo...

Re: Hortonworks HDF( Nifi ) : Site to Site ( https...

Re: How to add Content-Type=application/json to In...

How Do I Distribute Data Across an Apache NiFi Clu...

Re: NiFi - Site to Site Protocols?

Re: How to load schema file for NiFi ConvertCSVtoA...

Re: ExecuteSQL - UnresolvedUnionException

Re: ExecuteSQL - UnresolvedUnionException

Re: ExecuteSQL - UnresolvedUnionException