- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Fetch Provenance data using SiteToSiteProvenanceReportingTask
- Labels:
-
Apache NiFi
Created ‎05-28-2024 03:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was referring to Extracting NiFi Provenance Data using SiteToSitePr... - Cloudera Community - 248469 to set it up. docker has volume mounted the conf directory so that i can change properties.
Every time I set the nifi.remote.input.secure to false and rerun the docker, it reset the property file to true. I have tried changing the other properties, they don't reset back to default.I have tried setting the env variable using -e SITE_TO_SITE_SECURE=false and -e NIFI_REMOTE_INPUT_SECURE=false, both have unfortunately not taken effect.docker command
docker run -d --name nifi24 -p 8443:8443 -e SITE_TO_SITE_SECURE=false -v ~/tools/nifi24_conf/conf:/opt/nifi/nifi-current/conf -v ~/tools/nifi24_conf/lib:/opt/nifi/nifi-current/lib -v ~/tools/nifi24_conf/nar_extensions:/opt/nifi/nifi-current/extensions apache/nifi:1.24.0
# Site to Site properties
nifi.remote.input.host=c30abd07b4ba
nifi.remote.input.secure=true
nifi.remote.input.socket.port=10000
nifi.remote.input.http.enabled=false
nifi.remote.input.http.transaction.ttl=30 sec
nifi.remote.contents.cache.expiration=30 secs
nifi.web.http.host=
nifi.web.http.port=
nifi.web.http.network.interface.default=
#############################################
nifi.web.https.host=c30abd07b4ba
nifi.web.https.port=8443
nifi.web.https.network.interface.default=
nifi.web.https.application.protocols=http/1.1
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200
nifi.web.max.header.size=16 KB
nifi.web.proxy.context.path=
nifi.web.proxy.host=
nifi.web.max.content.size=
nifi.web.max.requests.per.second=30000 nifi.web.max.access.token.requests.per.second=25
nifi.web.request.timeout=60 secs
nifi.web.request.ip.whitelist=
nifi.web.should.send.server.version=true
nifi.web.request.log.format=%{client}a - %u %t "%r" %s %O "%{Referer}i" "%{User-Agent}i"
Unfortunately, I am not able to get it working as i understand that it is not possible to configure Site to Site with security disabled while also running NiFi with HTTPS. Those settings go together.
Please advise on how to get this working. Many Thanks
Created ‎06-12-2024 08:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @MattWho ,
I have figured it out,
I set the access policy recieve data via site-to-site and its has now started to work.
i used an api call to set the value referring to this.
Access Policies | CDP Private Cloud (cloudera.com)
thank you so much for your help.
TO Summarize,
nifi.properties
bash-4.4$ cat conf/nifi.properties | grep remote
nifi.remote.input.host=nifi-0.nifi-headless.namespace.svc.cluster.local
nifi.remote.input.secure=true
nifi.remote.input.socket.port=10443
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec
nifi.remote.contents.cache.expiration=30 secs
in another pod
nifi.remote.input.host=nifi-1.nifi-headless.namespace.svc.cluster.local
nifi.web.https.host=nifi-0.nifi-headless.namespace.svc.cluster.local
nifi.web.https.port=9443
and respectively on another pod
nifi.web.https.host=nifi-1.nifi-headless.namespace.svc.cluster.local
nifi.web.https.port=9443
set access policies
created reporting task
url set is podname.svc/https port
eg
https://nifi-0.nifi-headless.doc-norc.svc.cluster.local:9443/nifi
set management controller service
created an input port and remote group to send data
Created ‎05-28-2024 06:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@scoutjohn
The Site-To-Site (S2S) configuration properties configure how your NiFi instance handles both inbound S2S to and outbound S2S connections are handled. It is the receiving instance of NiFi the determines if S2S communication should be secure or not.
nifi.remote.input.secure=true nifi.remote.input.socket.port=10000 nifi.remote.input.http.enabled=false
First you need to understand how S2S works.
The instance of of NiFi with a RemoteProcessGroup (RPG) or a S2S Reporting task is the client side of the connection. When that client component (RPG or S2S reporting task) executes it need to communicate with the target NiFi. That initial communication is always going to be over HTTP(S) to the target NiFi. So if the target NiFi is secured (nifi.web.https.port configured) and the URL provided to RPG or S2S reporting task is "HTTPS" the initial connection is going to be secure. This initial connection is used to fetch S2S details from the target NiFi. Included in those S2S details are numerous bits of information to include:
- Does target support FlowFile http(s) input transfer? (nifi.remote.input.http.enabled)
- Does target NiFi support socket based FlowFile transfer? (nifi.remote.input.socket.port)
- Does target enforce secure communictaions (nifi.remote.input.secure)
- List of remote inbound and remote output ports the client is authorized to see.
- How many nodes in the target NiFi cluster.
- Load on each of those nodes
- etc.
With the setup you shared your NiFi is setup with only the nifi.web.https.port configured meaning that this NiFi can only support https communications from S2S connections.
Not sure why you would want to send your data unsecured over your network. Whey not send secure since your NiFi is already secured over https.
Now if you were to also configure the nifi.web.http.port (which makes no sense since you would be exposing your NiFi UI unsecured over http as well as secured over https), does it still force nifi.remote.input.secure back to true from false? I have not confgures http and https at same time for a very very long time (only some done rarely when there were different internal and external networks). I could not find any Apache Jiras that stated this is no longer an option, but it is possible that this has changed. But even if possible, i still question using unsecured when your NiFi is already secured.
Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created ‎05-28-2024 09:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @MattWho , Thank you for your response.
I am not trying to send data unsecured over network. Nifi running in my local is on https and i want it to stay it that way. But I would also like to fetch the Provenance data
I was trying to configure my nifi running on standalone mode based on what is described in the document
I have changed the nifi.remote.input.http.enabled as true
also tried adding StandardRestrictedSSLContextService
I have used the same value which is there in the truststore and keystore values in the nifi.properties.
i can see logs like this
// Another save pending = false
2024-05-29 04:05:25,991 INFO [Timer-Driven Process Thread-7] o.a.n.c.s.TimerDrivenSchedulingAgent SiteToSiteProvenanceReportingTask[id=a971da9d-018f-1000-2b00-6824f28134d8] started.
2024-05-29 04:05:26,214 INFO [Flow Service Tasks Thread-1] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@c18025a // Another save pending = false
2024-05-29 04:05:36,347 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2024-05-29 04:05:36,348 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 23 records in 0 milliseconds
2024-05-29 04:05:37,621 INFO [Timer-Driven Process Thread-6] o.a.n.p.store.WriteAheadStorePartition Successfully rolled over Event Writer for Provenance Event Store Partition[directory=./provenance_repository] due to MAX_TIME_REACHED. Event File was 17.76 KB and contained 10 events.
2024-05-29 04:05:56,348 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2024-05-29 04:05:56,352 INFO [pool-7-thread-1] o.a.n.wali.SequentialAccessWriteAheadLog Checkpointed Write-Ahead Log with 24 Records and 0 Swap Files in 3 milliseconds (Stop-the-world time = 1 milliseconds), max Transaction ID 48
But unfortunately, I do not see any data flowing into the input port
Created ‎05-29-2024 05:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@scoutjohn
The article you are using for reference was written back in 2016 before NiFi was changed to starting secure out of the box. It is written entirely around that unsecured NiFi example. You could always unsecure your NiFi and test out S2S capability. That would atleast allow you to test/evaluate the functionality.
When NiFi is secure both authentication and authorization must be handled. This includes authentication and authorizations for S2S operations. An out-of-box installation of NiFi utilizes self -generated self-signed certificates to create the keystore and truststore files needed for mutualTLS. It also uses a very basic non production single-user-provider for user authentication and a single-user-authorizer for user/client authorization. These basic providers make it easy to evaluate NiFi, but are not robust enough to support all features. Is this what you are using still or have you created your own keystore and truststore files and setup non single user authentication and authorization providers?
To be honest, I always setup production ready NiFi instance and clusters that don't use the auto-generated self-signed certificates and or single user providers. I can't say that I have tried using S2S in such out-of-box environment. So I can't say that the single-user-authorizer supports those needed authorizations.
Above being said, I see you set nifi.remote.input.http.enabled=true, but all that property does is allow http transport protocol which means that means that the NiFi would support transferring FlowFiles over http protocol. That does not mean unsecured, it could be http or https depending on the destination URL. The S2S properties in the the NiFi properties need to be modified to support secure S2S by changing nifi.remote.http.secure=true (you did not comment if you made that change or not).
1. Is your S2SProvenanceReportingTask producing any bulletin messages?
2. Are you seeing any not authorized related log lines in the nifi-user.log?
3. What keystore and truststore did you configure in the StandardRestrictedSSLContextService controller service?
I'll try to mess around with and out-of-box setup if that is what you are using to see if what you are trying to do is possible in such a non-production ready setup when I have some time.
Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created on ‎05-29-2024 10:03 PM - edited ‎05-29-2024 10:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @MattWho, thank you for looking into it.
- I have set nifi.remote.input.secure=true,
# Site to Site properties 8c92690b14e6
nifi.remote.input.host=cd8e8c899db6
nifi.remote.input.secure=true
nifi.remote.input.socket.port=10000
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec
nifi.remote.contents.cache.expiration=30 secs​
- S2SProvenanceReportingTask is not producing any bulletin messages
- There are no errors in the user log. I can see the StandardRestrictedSSLContextService and SiteToSiteProvenanceReportingTask started in the logs
2024-05-30 04:48:43,666 INFO [Timer-Driven Process Thread-9] o.a.n.c.s.StandardControllerServiceNode Successfully enabled StandardControllerServiceNode[service=SSLContextService[id=c27f79ba-018f-1000-ada5-343b2ba8f4e2], name=StandardRestrictedSSLContextService, active=true]
2024-05-30 04:49:07,157 INFO [Timer-Driven Process Thread-1] o.a.n.c.s.TimerDrivenSchedulingAgent SiteToSiteProvenanceReportingTask[id=a971da9d-018f-1000-2b00-6824f28134d8] started.​
- I am using the out of the box installation for NiFi , have not created the any other certificates. Have set the same values from the configurations that is used in nifi.security properties for configuring the StandardRestrictedSSLContextService.
nifi.security.autoreload.enabled=false
nifi.security.autoreload.interval=10 secs
nifi.security.keystore=./conf/keystore.p12
nifi.security.keystoreType=PKCS12
nifi.security.keystorePasswd=b465f3c4cb37f83f825a2166a656719f
nifi.security.keyPasswd=b465f3c4cb37f83f825a2166a656719f
nifi.security.truststore=./conf/truststore.p12
nifi.security.truststoreType=PKCS12
nifi.security.truststorePasswd=e20ef7bb480f25c7e2446bbaffc1d95b​
Created ‎05-30-2024 01:27 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@scoutjohn
I installed an out-of-the-box Apache NiFi 1.26 using single user providers and the NiFi self-signed generated certificates.
I was able to send provenance events via the S2SProvenanceReportingTask successfully back to a Remote Input Port on the same NiFi with no issues. So authorization is not an issue here. I tested using both HTTP and RAW transport protocols successfully.
I also validated that S2S was working by setting up a Remote Process Group to send FlowFiles to a Remote Input port as well. Here is the dataflow I setup:
You can see in the above that i generated some FlowFiles that were sent over S2S to the "Input1" remote port. You can also see that my "prov" port received provenance events from the S2SProvenanceReportingTask.
My S2S setting from nifi.properties file:
# Site to Site properties
nifi.remote.input.host=localhost
nifi.remote.input.secure=true
nifi.remote.input.socket.port=10001
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec
nifi.remote.contents.cache.expiration=30 secs
My Remote Process Group configuration:
Switching to "HTTP" transport protocol also worked.
S2SProvenanceReportingTask configuration:
While all of this worked correctly, sending provenance events via the S2SProvenanceReportingTask back to the same NiFi is not advisable. It creates an endless loop of provenance events. For every FlowFile received on the "prov" port another provenance "RECEIVE" event is created which then gets set by the reporting task. This an infinite loop is created. You would certainly have difficulty related to authentication and authorization sending to another NiFi instance using the out-of-the-box keystore, truststore, and single user providers between two out of the box NiFi deployments. But for testing purposes this works.
Now I see from your configuration you setup:
nifi.remote.input.host=cd8e8c899db6
Makes me wonder if that given hostname is:
- A SAN entry in the NiFi generated keystore certificate. You could use keytool command to check.
keytool -v -list -keystore keystore.p12​
- That hostname is resolvable and reachable by your NiFi instance.
Try changing that property to "localhost" see if it resolves your issue.
Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created ‎05-31-2024 04:51 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was able to make this running by running this on Windows. Using the same configurations as you have done.
My docker runs on WSL, Nifi was not coming up when i changed the host name to localhost.
Thank you so much for your time
Created on ‎06-06-2024 03:29 AM - edited ‎06-06-2024 05:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @MattWho,
Sorry to come back on this topic.
I am trying to implement the same S2S reporting task in a Kubernetes environment.
we have NiFi running in cluster mode. and we have 2 pods (it is usually 3, on embedded zookeeper. We're just working with 2 nodes for the time being)
The configuration for s2s is set as follows
bash-4.4$ cat conf/nifi.properties | grep remote
nifi.remote.input.host=nifi-0.nifi-headless.namespace.svc.cluster.local
nifi.remote.input.secure=true
nifi.remote.input.socket.port=10443
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec
nifi.remote.contents.cache.expiration=30 secs
I have tried with protocol RAW and HTTP
we have also given permission to the user in Retrieve site-to-site details
we have set the destination URL as https://<fullyqualifiedDNS>:portnumber
https://nifi-0.nifi-headless.namespace.svc.cluster.local:9443
the host is
nifi.remote.input.host=nifi-0.nifi-headless.doc-norc.svc.cluster.local
port number we're using is
nifi.web.https.port=9443
but the events are not coming into the port
logs says
the authentication is successful
{"type":"log", "facility":"25", "host":"ao0059-cjts5-worker-0-lqm8m", "level":"INFO", "event-type":"N_USER_OPER", "systemid":"nifi","neid":"706546b360714e94b74591ca351b0655", "system":"nifi-0", "time":"2024-06-06T12:06:13.189Z" ,"timezone":"UTC", "log":"[NiFi Web Server-1830] o.a.n.w.s.NiFiAuthenticationFilter Authentication Started 10.255.15.73 [CN=nifi-api-admin] GET https://nifi-0.nifi-headless.namespace.svc.cluster.local:9443/nifi-api/site-to-site"}
{"type":"log", "facility":"25", "host":"ao0059-cjts5-worker-0-lqm8m", "level":"INFO", "event-type":"N_USER_OPER", "systemid":"nifi","neid":"706546b360714e94b74591ca351b0655", "system":"nifi-0", "time":"2024-06-06T09:51:41.644Z" ,"timezone":"UTC", "log":"[NiFi Web Server-854] o.a.n.w.s.NiFiAuthenticationFilter Authentication Success [CN=nifi-api-admin] 10.255.15.73 GET https://nifi-0.nifi-headless.namespace.svc.cluster.local:9443/nifi-api/site-to-site"}
The log also says
No events to send due to 'events' being null or empty.
{"type":"log", "host":"ao0059-cjts5-worker-0-lqm8m", "level":"DEBUG", "event-type":"N_USER_OPER", "systemid":"nifi","neid":"706546b360714e94b74591ca351b0655", "system":"nifi-0", "time":"2024-06-06T10:05:39.499Z" ,"timezone":"UTC", "log":"[Timer-Driven Process Thread-8] o.a.n.r.SiteToSiteProvenanceReportingTask SiteToSiteProvenanceReportingTask[id=ecacc388-018f-1000-ffff-ffff8c5138a7] Returning LOCAL State: StandardStateMap[version=-1, values={}]"}
{"type":"log", "host":"ao0059-cjts5-worker-0-lqm8m", "level":"DEBUG", "event-type":"N_USER_OPER", "systemid":"nifi","neid":"706546b360714e94b74591ca351b0655", "system":"nifi-0", "time":"2024-06-06T10:05:39.499Z" ,"timezone":"UTC", "log":"[Timer-Driven Process Thread-8] o.a.n.r.SiteToSiteProvenanceReportingTask SiteToSiteProvenanceReportingTask[id=ecacc388-018f-1000-ffff-ffff8c5138a7] No events to send due to 'events' being null or empty."}
{"type":"log", "host":"ao0059-cjts5-worker-0-lqm8m", "level":"DEBUG", "event-type":"N_USER_OPER", "systemid":"nifi","neid":"706546b360714e94b74591ca351b0655", "system":"nifi-0", "time":"2024-06-06T10:05:44.501Z" ,"timezone":"UTC", "log":"[Timer-Driven Process Thread-4] o.a.n.r.SiteToSiteProvenanceReportingTask SiteToSiteProvenanceReportingTask[id=ecacc388-018f-1000-ffff-ffff8c5138a7] Returning LOCAL State: StandardStateMap[version=-1, values={}]"}
{"type":"log", "host":"ao0059-cjts5-worker-0-lqm8m", "level":"DEBUG", "event-type":"N_USER_OPER", "systemid":"nifi","neid":"706546b360714e94b74591ca351b0655", "system":"nifi-0", "time":"2024-06-06T10:05:44.502Z" ,"timezone":"UTC", "log":"[Timer-Driven Process Thread-4] o.a.n.r.SiteToSiteProvenanceReportingTask SiteToSiteProvenanceReportingTask[id=ecacc388-018f-1000-ffff-ffff8c5138a7] No events to send due to 'events' being null or empty."}
there are entries on data provenance
there was also an issue where the task complained that the input port is not available
{"type":"log", "host":"ao0059-cjts5-worker-0-z5b82", "level":"ERROR", "event-type":"N_USER_OPER", "systemid":"nifi","neid":"174577b6145e4b87a86e5d9c397c8f75", "system":"nifi-0", "time":"2024-06-04T11:20:41.566Z" ,"timezone":"UTC", "log":"[Timer-Driven Process Thread-3] o.a.n.r.SiteToSiteProvenanceReportingTask SiteToSiteProvenanceReportingTask[id=e2e547c5-018f-1000-0000-00004876faee] Error running task SiteToSiteProvenanceReportingTask[id=e2e547c5-018f-1000-0000-00004876faee] due to org.apache.nifi.processor.exception.ProcessException: Failed to send Provenance Events to destination due to IOException:Could not find Port with name 'prov' for remote NiFi instance"}
but now this is not coming up, though we have not made any changes on it.
there are no other errors in the logs, have enabled debug to
org.apache.nifi.reporting
I tried the nifi-api
https://nifi-0.nifi-headless.namespace.svc.cluster.local:9443/nifi-api/site-to-site/peers
and
https://nifi-0.nifi-headless.namespace.svc.cluster.local:9443/nifi-api/nifi-api/site-to-site
Can you please help us with this? Thank you for your time
Created ‎06-07-2024 07:39 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@scoutjohn
I don't have a Kubernetes env to mess around with currently.
But a couple things i see from your response:
- Your urls appear to be missing the /nifi on the end.
- What value is set for "nifi.web.http.host" in the nifi.properties on each instance of your K8s cluster?
- Is "nifi-0.nifi-headless.namespace.svc.cluster.local" being used in S2SProvenanceReporting task resolvable on the NiFi host to a valid IP address that is reachable between nodes?
- Are port available and unused on both hosts?
- Configuration match on both hosts in nifi.properties (with exception of host specific properties)?
- PrivateKey certificates used by hosts contain proper EKUs and SAN entries needed?
Thank you,
Matt
Created on ‎06-07-2024 10:27 PM - edited ‎06-07-2024 11:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @MattWho , Thank you for your response.
- I have tried with /nifi also in the URLs, but unfortunately the data is not coming in
- we're using https, there is no value set for http . instead we have https
nifi.web.https.host=nifi-0.nifi-headless.namespace.svc.cluster.local
nifi.web.https.port=9443
and respectively on other pod
nifi.web.https.host=nifi-1.nifi-headless.namespace.svc.cluster.local
nifi.web.https.port=9443
we also have proxy host
nifi.web.proxy.context.path=/apigw/namespace/nifi
nifi.web.proxy.host=ckng.apps.ao0059.tre.nsn-rdnet.net:443, nifi-headless.namespace.svc.cluster.local:9443
which is same for both the pods
- Yes the host name is reachable
bash-4.4$ ping nifi-1.nifi-headless.namespace.svc.cluster.local
PING nifi-1.nifi-headless.namespace.svc.cluster.local (10.255.8.118) 56(84) bytes of data.
64 bytes from nifi-1.nifi-headless.namespace.svc.cluster.local (10.255.8.118): icmp_seq=1 ttl=64 time=0.019 ms
64 bytes from nifi-1.nifi-headless.namespace.svc.cluster.local (10.255.8.118): icmp_seq=2 ttl=64 time=0.027 ms
and for nifi0
bash-4.4$ ping nifi-0.nifi-headless.namespace.svc.cluster.local
PING nifi-0.nifi-headless.namespace.svc.cluster.local (10.255.8.118) 56(84) bytes of data.
64 bytes from nifi-0.nifi-headless.namespace.svc.cluster.local (10.255.8.118): icmp_seq=1 ttl=64 time=0.019 ms
64 bytes from nifi-0.nifi-headless.namespace.svc.cluster.local (10.255.8.118): icmp_seq=2 ttl=64 time=0.027 ms
the configurations match for both the pods
- yes , private key used have the proper EKUs and SAN entries