Member since
11-14-2019
51
Posts
6
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
696 | 05-26-2023 08:40 AM | |
676 | 05-22-2023 02:38 AM | |
682 | 05-15-2023 11:25 AM |
01-26-2024
02:09 AM
1 Kudo
Hi @SandyClouds , I ran into this issue before and after some research I found that when you do the ConvertJsonToSQL nifi assigns timestamp data type (value = 93 in the sql.args.[n].type attribute ). When the PutSQL runs the generated sql statement it will parse the value according to the assigned type and format it accordingly. However for timestamp it expects it to be in the format of "yyyy-MM-dd HH:mm:ss.SSS" so if you are missing the milliseconds in the original datetime value it will fail with the specified error message. To resolve the issue make sure to assign 000 milliseconds to your datetime value before running the PUTSQL processor. You can do that in the source Json itself before the conversion to SQL or after conversion to SQL using UpdateAttribute, by using the later option you have to know which sql.args.[n].value will have the datetime and do expression language to reformat. If that helps please accept solution. Thanks
... View more
09-19-2023
07:00 AM
1 Kudo
@SandyClouds The "Jolt Specification" property in the JoltTransformJson processor supports NiFi Expression Language (will be evaluated using flow file attributes and variable registry). This means that any attribute present on the FlowFile passed to the JoltTransform processor can be used in the spec. So if you have an upstream processor adding an attribute to your FlowFile with the attribute name "customer_id", you can simply use NiFi Expression Language in your spec to pull in that attributes value. NiFi Parameters are defined using "#{<parameter name>}" which will look up that parameter in the parameter context defined on the NiFi Process Group and return its value. NiFi Expression language is defined by starting with "${" and ending with "}". Between that you can bring in an attributes value, apply NiFi Expression Language functions against an attributes value, etc. Simplest NiFi Expression Language statement would be "${customer_id}", which will look for an attribute in the FlowFile being processed and return its value. We refer to "customer_id" as the "subject" in that NiFi EL. NiFi EL does have a bunch of subjectless functions like "now()" that you are using, but more typical use cases start with a subject (FlowFile attribute). I strongly encourage you to read through the linked NiFi EL guide to help you better understand the complexity and many ways you can used NiFi EL statements. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
07-17-2023
08:03 AM
Below is the error in bulletin board: Unable to write flowfile content to content repository container default due to archive file size constraints; waiting for archive cleanup. Total number of files currently archived = 1 My flows are simple and this is just test environment, so not much flow files as well may be some 10-20 only. My content_repository is mounted on a disk. it is just 8MB occupied. Disk has some 100 GB storage. below are properties of content repository: # Content Repository
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.content.claim.max.appendable.size=50 KB
nifi.content.repository.directory.default=./content_repository
nifi.content.repository.archive.max.retention.period=7 days
nifi.content.repository.archive.max.usage.percentage=50%
nifi.content.repository.archive.enabled=true
nifi.content.repository.always.sync=false
nifi.content.viewer.url=../nifi-content-viewer/ I am confused why there is an issue with file size constraints ? as my disc has enough storage capacity. and my flows are simple and not more than 10-20.. so want to understand what is eating space or any worng configuration. Please help me understand. below is complete nifi.properties file. # Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Core Properties #
nifi.flow.configuration.file=./conf/flow.xml.gz
nifi.flow.configuration.json.file=./conf/flow.json.gz
nifi.flow.configuration.archive.enabled=true
nifi.flow.configuration.archive.dir=./conf/archive/
nifi.flow.configuration.archive.max.time=30 days
nifi.flow.configuration.archive.max.storage=500 MB
nifi.flow.configuration.archive.max.count=
nifi.flowcontroller.autoResumeState=true
nifi.flowcontroller.graceful.shutdown.period=10 sec
nifi.flowservice.writedelay.interval=500 ms
nifi.administrative.yield.duration=30 sec
# If a component has no work to do (is "bored"), how long should we wait before checking again for work?
nifi.bored.yield.duration=10 millis
nifi.queue.backpressure.count=10000
nifi.queue.backpressure.size=1 GB
nifi.authorizer.configuration.file=./conf/authorizers.xml
nifi.login.identity.provider.configuration.file=./conf/login-identity-providers.xml
nifi.templates.directory=./conf/templates
nifi.ui.banner.text=
nifi.ui.autorefresh.interval=30 sec
nifi.nar.library.directory=./lib
nifi.nar.library.autoload.directory=./extensions
nifi.nar.working.directory=./work/nar/
nifi.documentation.working.directory=./work/docs/components
nifi.nar.unpack.uber.jar=false
####################
# State Management #
####################
nifi.state.management.configuration.file=./conf/state-management.xml
# The ID of the local state provider
nifi.state.management.provider.local=local-provider
# The ID of the cluster-wide state provider. This will be ignored if NiFi is not clustered but must be populated if running in a cluster.
nifi.state.management.provider.cluster=zk-provider
# Specifies whether or not this instance of NiFi should run an embedded ZooKeeper server
nifi.state.management.embedded.zookeeper.start=false
# Properties file that provides the ZooKeeper properties to use if <nifi.state.management.embedded.zookeeper.start> is set to true
nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties
# H2 Settings
nifi.database.directory=./database_repository
nifi.h2.url.append=;LOCK_TIMEOUT=25000;WRITE_DELAY=0;AUTO_SERVER=FALSE
# Repository Encryption properties override individual repository implementation properties
nifi.repository.encryption.protocol.version=
nifi.repository.encryption.key.id=
nifi.repository.encryption.key.provider=
nifi.repository.encryption.key.provider.keystore.location=
nifi.repository.encryption.key.provider.keystore.password=
# FlowFile Repository
nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository
nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.SequentialAccessWriteAheadLog
nifi.flowfile.repository.directory=./flowfile_repository
nifi.flowfile.repository.checkpoint.interval=20 secs
nifi.flowfile.repository.always.sync=false
nifi.flowfile.repository.retain.orphaned.flowfiles=true
nifi.swap.manager.implementation=org.apache.nifi.controller.FileSystemSwapManager
nifi.queue.swap.threshold=20000
# Content Repository
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.content.claim.max.appendable.size=50 KB
nifi.content.repository.directory.default=./content_repository
nifi.content.repository.archive.max.retention.period=7 days
nifi.content.repository.archive.max.usage.percentage=50%
nifi.content.repository.archive.enabled=true
nifi.content.repository.always.sync=false
nifi.content.viewer.url=../nifi-content-viewer/
# Provenance Repository Properties
nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository
# Persistent Provenance Repository Properties
nifi.provenance.repository.directory.default=./provenance_repository
nifi.provenance.repository.max.storage.time=30 days
nifi.provenance.repository.max.storage.size=10 GB
nifi.provenance.repository.rollover.time=10 mins
nifi.provenance.repository.rollover.size=100 MB
nifi.provenance.repository.query.threads=2
nifi.provenance.repository.index.threads=2
nifi.provenance.repository.compress.on.rollover=true
nifi.provenance.repository.always.sync=false
# Comma-separated list of fields. Fields that are not indexed will not be searchable. Valid fields are:
# EventType, FlowFileUUID, Filename, TransitURI, ProcessorID, AlternateIdentifierURI, Relationship, Details
nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename, ProcessorID, Relationship
# FlowFile Attributes that should be indexed and made searchable. Some examples to consider are filename, uuid, mime.type
nifi.provenance.repository.indexed.attributes=
# Large values for the shard size will result in more Java heap usage when searching the Provenance Repository
# but should provide better performance
nifi.provenance.repository.index.shard.size=500 MB
# Indicates the maximum length that a FlowFile attribute can be when retrieving a Provenance Event from
# the repository. If the length of any attribute exceeds this value, it will be truncated when the event is retrieved.
nifi.provenance.repository.max.attribute.length=65536
nifi.provenance.repository.concurrent.merge.threads=2
# Volatile Provenance Respository Properties
nifi.provenance.repository.buffer.size=100000
# Component and Node Status History Repository
nifi.components.status.repository.implementation=org.apache.nifi.controller.status.history.VolatileComponentStatusRepository
# Volatile Status History Repository Properties
nifi.components.status.repository.buffer.size=1440
nifi.components.status.snapshot.frequency=1 min
# QuestDB Status History Repository Properties
nifi.status.repository.questdb.persist.node.days=14
nifi.status.repository.questdb.persist.component.days=3
nifi.status.repository.questdb.persist.location=./status_repository
# Site to Site properties
nifi.remote.input.host=dwh-architrave.architrave.prod
nifi.remote.input.secure=false
nifi.remote.input.socket.port=10000
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec
nifi.remote.contents.cache.expiration=30 secs
# web properties #
#############################################
# For security, NiFi will present the UI on 127.0.0.1 and only be accessible through this loopback interface.
# Be aware that changing these properties may affect how your instance can be accessed without any restriction.
# We recommend configuring HTTPS instead. The administrators guide provides instructions on how to do this.
nifi.web.http.host=dwh-architrave.architrave.prod
nifi.web.http.port=8080
nifi.web.http.network.interface.default=
#############################################
nifi.web.https.host=
nifi.web.https.port=
nifi.web.https.network.interface.default=
nifi.web.https.application.protocols=http/1.1
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200
nifi.web.max.header.size=16 KB
nifi.web.proxy.context.path=
nifi.web.proxy.host=
nifi.web.max.content.size=
nifi.web.max.requests.per.second=30000
nifi.web.max.access.token.requests.per.second=25
nifi.web.request.timeout=60 secs
nifi.web.request.ip.whitelist=
nifi.web.should.send.server.version=true
nifi.web.request.log.format=%{client}a - %u %t "%r" %s %O "%{Referer}i" "%{User-Agent}i"
# Include or Exclude TLS Cipher Suites for HTTPS
nifi.web.https.ciphersuites.include=
nifi.web.https.ciphersuites.exclude=
# security properties #
nifi.sensitive.props.key='12345678901234567890A'
nifi.sensitive.props.key.protected=
nifi.sensitive.props.algorithm=NIFI_PBKDF2_AES_GCM_256
nifi.sensitive.props.additional.keys=
nifi.security.autoreload.enabled=false
nifi.security.autoreload.interval=10 secs
nifi.security.keystore=
nifi.security.keystoreType=
nifi.security.keystorePasswd=
nifi.security.keyPasswd=
nifi.security.truststore=
nifi.security.truststoreType=
nifi.security.truststorePasswd=
nifi.security.user.authorizer=single-user-authorizer
nifi.security.allow.anonymous.authentication=false
nifi.security.user.login.identity.provider=
nifi.security.user.jws.key.rotation.period=PT1H
nifi.security.ocsp.responder.url=
nifi.security.ocsp.responder.certificate=
# OpenId Connect SSO Properties #
nifi.security.user.oidc.discovery.url=
nifi.security.user.oidc.connect.timeout=5 secs
nifi.security.user.oidc.read.timeout=5 secs
nifi.security.user.oidc.client.id=
nifi.security.user.oidc.client.secret=
nifi.security.user.oidc.preferred.jwsalgorithm=
nifi.security.user.oidc.additional.scopes=
nifi.security.user.oidc.claim.identifying.user=
nifi.security.user.oidc.fallback.claims.identifying.user=
nifi.security.user.oidc.truststore.strategy=JDK
# Apache Knox SSO Properties #
nifi.security.user.knox.url=
nifi.security.user.knox.publicKey=
nifi.security.user.knox.cookieName=hadoop-jwt
nifi.security.user.knox.audiences=
# SAML Properties #
nifi.security.user.saml.idp.metadata.url=
nifi.security.user.saml.sp.entity.id=
nifi.security.user.saml.identity.attribute.name=
nifi.security.user.saml.group.attribute.name=
nifi.security.user.saml.request.signing.enabled=false
nifi.security.user.saml.want.assertions.signed=true
nifi.security.user.saml.signature.algorithm=http://www.w3.org/2001/04/xmldsig-more#rsa-sha256
nifi.security.user.saml.authentication.expiration=12 hours
nifi.security.user.saml.single.logout.enabled=false
nifi.security.user.saml.http.client.truststore.strategy=JDK
nifi.security.user.saml.http.client.connect.timeout=30 secs
nifi.security.user.saml.http.client.read.timeout=30 secs
# Identity Mapping Properties #
# These properties allow normalizing user identities such that identities coming from different identity providers
# (certificates, LDAP, Kerberos) can be treated the same internally in NiFi. The following example demonstrates normalizing
# DNs from certificates and principals from Kerberos into a common identity string:
#
# nifi.security.identity.mapping.pattern.dn=^CN=(.*?), OU=(.*?), O=(.*?), L=(.*?), ST=(.*?), C=(.*?)$
# nifi.security.identity.mapping.value.dn=$1@$2
# nifi.security.identity.mapping.transform.dn=NONE
# nifi.security.identity.mapping.pattern.kerb=^(.*?)/instance@(.*?)$
# nifi.security.identity.mapping.value.kerb=$1@$2
# nifi.security.identity.mapping.transform.kerb=UPPER
# Group Mapping Properties #
# These properties allow normalizing group names coming from external sources like LDAP. The following example
# lowercases any group name.
#
# nifi.security.group.mapping.pattern.anygroup=^(.*)$
# nifi.security.group.mapping.value.anygroup=$1
# nifi.security.group.mapping.transform.anygroup=LOWER
# Listener Bootstrap properties #
# This property defines the port used to listen for communications from NiFi Bootstrap. If this property
# is missing, empty, or 0, a random ephemeral port is used.
nifi.listener.bootstrap.port=0
# cluster common properties (all nodes must have same values) #
nifi.cluster.protocol.heartbeat.interval=5 sec
nifi.cluster.protocol.heartbeat.missable.max=8
nifi.cluster.protocol.is.secure=false
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=false
nifi.cluster.node.address=dwh-architrave.architrave.prod
nifi.cluster.node.protocol.port=8082
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=30 sec
nifi.cluster.flow.election.max.candidates=
# cluster load balancing properties #
nifi.cluster.load.balance.host=
nifi.cluster.load.balance.port=6342
nifi.cluster.load.balance.connections.per.node=1
nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=30 sec
# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=
nifi.zookeeper.connect.timeout=10 secs
nifi.zookeeper.session.timeout=10 secs
nifi.zookeeper.root.node=/nifi
nifi.zookeeper.client.secure=false
nifi.zookeeper.security.keystore=
nifi.zookeeper.security.keystoreType=
nifi.zookeeper.security.keystorePasswd=
nifi.zookeeper.security.truststore=
nifi.zookeeper.security.truststoreType=
nifi.zookeeper.security.truststorePasswd=
nifi.zookeeper.jute.maxbuffer=
# Zookeeper properties for the authentication scheme used when creating acls on znodes used for cluster management
# Values supported for nifi.zookeeper.auth.type are "default", which will apply world/anyone rights on znodes
# and "sasl" which will give rights to the sasl/kerberos identity used to authenticate the nifi node
# The identity is determined using the value in nifi.kerberos.service.principal and the removeHostFromPrincipal
# and removeRealmFromPrincipal values (which should align with the kerberos.removeHostFromPrincipal and kerberos.removeRealmFromPrincipal
# values configured on the zookeeper server).
nifi.zookeeper.auth.type=
nifi.zookeeper.kerberos.removeHostFromPrincipal=
nifi.zookeeper.kerberos.removeRealmFromPrincipal=
# kerberos #
nifi.kerberos.krb5.file=
# kerberos service principal #
nifi.kerberos.service.principal=
nifi.kerberos.service.keytab.location=
# kerberos spnego principal #
nifi.kerberos.spnego.principal=
nifi.kerberos.spnego.keytab.location=
nifi.kerberos.spnego.authentication.expiration=12 hours
# external properties files for variable registry
# supports a comma delimited list of file locations
nifi.variable.registry.properties=
# analytics properties #
nifi.analytics.predict.enabled=false
nifi.analytics.predict.interval=3 mins
nifi.analytics.query.interval=5 mins
nifi.analytics.connection.model.implementation=org.apache.nifi.controller.status.analytics.models.OrdinaryLeastSquares
nifi.analytics.connection.model.score.name=rSquared
nifi.analytics.connection.model.score.threshold=.90
# runtime monitoring properties
nifi.monitor.long.running.task.schedule=
nifi.monitor.long.running.task.threshold=
# Create automatic diagnostics when stopping/restarting NiFi.
# Enable automatic diagnostic at shutdown.
nifi.diagnostics.on.shutdown.enabled=false
# Include verbose diagnostic information.
nifi.diagnostics.on.shutdown.verbose=false
# The location of the diagnostics folder.
nifi.diagnostics.on.shutdown.directory=./diagnostics
# The maximum number of files permitted in the directory. If the limit is exceeded, the oldest files are deleted.
nifi.diagnostics.on.shutdown.max.filecount=10
# The diagnostics folder's maximum permitted size in bytes. If the limit is exceeded, the oldest files are deleted.
nifi.diagnostics.on.shutdown.max.directory.size=10 MB
# Performance tracking properties
## Specifies what percentage of the time we should track the amount of time processors are using CPU, reading from/writing to content repo, etc.
## This can be useful to understand which components are the most expensive and to understand where system bottlenecks may be occurring.
## The value must be in the range of 0 (inclusive) to 100 (inclusive). A larger value will produce more accurate results, while a smaller value may be
## less expensive to compute.
## Results can be obtained by running "nifi.sh diagnostics <filename>" and then inspecting the produced file.
nifi.performance.tracking.percentage=0
# NAR Provider Properties #
# These properties allow configuring one or more NAR providers. A NAR provider retrieves NARs from an external source
# and copies them to the directory specified by nifi.nar.library.autoload.directory.
#
# Each NAR provider property follows the format:
# nifi.nar.library.provider.<identifier>.<property-name>
#
# Each NAR provider must have at least one property named "implementation".
#
# Example HDFS NAR Provider:
# nifi.nar.library.provider.hdfs.implementation=org.apache.nifi.flow.resource.hadoop.HDFSExternalResourceProvider
# nifi.nar.library.provider.hdfs.resources=/path/to/core-site.xml,/path/to/hdfs-site.xml
# nifi.nar.library.provider.hdfs.storage.location=hdfs://hdfs-location
# nifi.nar.library.provider.hdfs.source.directory=/nars
# nifi.nar.library.provider.hdfs.kerberos.principal=nifi@NIFI.COM
# nifi.nar.library.provider.hdfs.kerberos.keytab=/path/to/nifi.keytab
# nifi.nar.library.provider.hdfs.kerberos.password=
#
# Example NiFi Registry NAR Provider:
# nifi.nar.library.provider.nifi-registry.implementation=org.apache.nifi.registry.extension.NiFiRegistryNarProvider
# nifi.nar.library.provider.nifi-registry.url=http://localhost:18080
... View more
Labels:
- Labels:
-
Apache NiFi
07-07-2023
06:31 AM
1 Kudo
@SandyClouds What i am saying is that rest-api call response you got via command line is probably not the exact same request made by NiFi. What that response json tells you is that the user that made the request only has READ to all three buckets. You did not share the complete rest-api call you made, so I am guessing you did not pass any user authentication token or clientAuth certificate in yoru rest-api request and this that request was treated as bing made by an anonymous user. Since your bucket(s) have the "make publicly visible" checked, anonymous user will have only READ access to the buckets. In NiFi you are trying to start version control on a Process group. That means that the NiFi node certificate needs to be authorized on all buckets and authorized to proxy the NiFi authenticated user (who also needs to have read, write on the specific bucket to start version control. If the response received back to NiFi during the request is that nodes or user on has READ, then no buckets would be shown in that UI since the action that UI is performing would be a WRITE. You should use the "developer tools" in your web browser to capture the actual full request being made by NiFi when you try to start version control on a process group. Most developer tolls even give you and option to "copy as curl" which you can then just paste in a cmd window and execute manually yourself outside of NiFi to see the same response NiFi is getting. My guess here is that you are having a TLS handshake issue resulting in yoru requests to NiFi-Registry being anonymous requests. The NiFi logs are not going to show you details on what authorization was on the NiFi-Registry side (you'll need to check the nifi-registry-app.log for that). All the NiFi log you shared shows is that your "new_admin" was authorized to execute the NiFi rest-api endpoint. This triggering the connection attempt with NiFi-Registry which also appeared successful (likely as anonymous). If you look at the verbose output for the NiFi keystore, NiFi truststore, NiFi-Registry keystore, and NiFi-Registry truststore, you'll be able to determine if a mutual TLS exchange would even be possible between the two services using the keystore and truststores you have in place. These versbose outputs which you can get using the keytool command will also show you the EKUs and SANs present on your certificates. <path to JDK>/keytool -v -list -keystore <keystore or truststore file> When prompted for password, you can just hit enter without providing password. The screenshots you shared showing the authorizations configured in NiFi-Registry are sufficient. I was simply pointing out that you gave too much authorization to your NiFi node in the global policy section. This is why i am leaning more to a TLS handshake issue resulting in an anonymous user authentication rather then an authorization issue. You'll also want to inspect your nifi-registry-app.log for the authorization request coming from NiFi to see if is authorizing "anonymous" or some other string. Perhaps you have some identity mapping pattern configured on NiFi-Registry side that is trimming out the CN from your certificate DNs? In that case you could also see this behavior because your NiFi-Registry would be looking up authorization on different identity string (instead of "CN=new_admin, OU=Nifi, i maybe looking for just "new_admin"). NiFi and NiFi-Registry authorizations are exact string matches (case sensitive); otherwise, treated as different users. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
07-04-2023
06:53 AM
Thank you @MattWho . Sorry to comment late on this. Just to add more details to the answer and how it worked for me: in my docker compose, earlier i used below environment variable: - INITIAL_ADMIN_IDENTITY='CN=admin, OU=NiFi' but somehow it is not interpreted properly.. and giving extra quotes as it is inside containers.. so even though the below looks weird, it worked. - INITIAL_ADMIN_IDENTITY=CN=admin, OU=NiFi (observe i removed single quotes around the value after 😃
... View more
06-27-2023
07:17 AM
@alim Can you please suggest..
... View more
06-15-2023
02:22 AM
1 Kudo
Thanks @MattWho I created https://issues.apache.org/jira/browse/NIFI-11695 Hoping for the best!
... View more
06-08-2023
02:11 AM
1 Kudo
@MattWho Thanks much for clearing my confusions around on - how my jobs running automatically when i restart server.. I encountered restart couple of times but never know this feature. You are awesome!!
... View more
06-07-2023
12:24 PM
2 Kudos
@SandyClouds Some clarity and additions to @cotopaul Pros and Cons: Single Node: PROs: - easy to manage. <-- Setup and managing configuration is easier since you only need to do that on one node. But in a cluster, all nodes configuration files will be almost the same (some variations in hostname properties and certificates if you secure your cluster). - easy to configure. <-- There are more configurations needed in a cluster setup, but once setup, nothing changes from the user experience when it comes to interacting with the UI. - no https required. <-- Not sure how this is a PRO. I would not recommend using an un-secure NiFi as doing so allow anyone access to your dataflows and the data being processed. You can also have an un-secure NiFi cluster while i do not recommend that either. CONs: - in case of issues with the node, you NiFi instance is down. <-- Very true, single point of failure. - it uses plenty of resources, when it needs to process data, as everything is done on a single node. Cluster: PROs: - redundancy and failover --> when a node goes down, the others will take over and process everything, meaning that you will not get affected. <-- Not complete accurate. Each node in a NiFi cluster is only aware of the data (FlowFiles) queued on that specific node. So each node works on the FlowFile present on that one node, so it is the responsibility of the dataflow designer/builder to make sure they built their dataflows in such away to ensure distribution of FlowFiles across all nodes. When a node goes down, any data FlowFiles currently queued on that down node are not going to be processed by the other nodes. However, other nodes will continue processing their data and all new data coming in to your dataflow cluster - the used resources will be split among all the nodes, meaning that you can cover more use cases as on a single node. <-- Different nodes do not share or pool resources from all nodes in the cluster. If your dataflow(s) are built correctly the volume of data (FlowFiles) being processed will be distributed across all your nodes along each node to process a smaller subset of the overall FlowFile volume. This means more resources available across yoru cluster to handle more volume. NEW -- A NiFi cluster can be accessed via any one of the member nodes. No matter which node's UI you access, you will be presented with stats for all nodes. There is a cluster UI accessible from the global menu that allows you to see a breakdown of each node. Any changes you make from the UI of any one of the member nodes will be replicated to all nodes. NEW -- Since all nodes run their own copy of the flow, a catastrophic node failure does not mean loss of all your work since the same flow.json.gz (contains everything related to your dataflows) can be retrieved from any of the other nodes in your cluster. CONs: - complex setup as it requires a Zookeeper + plenty of other config files. <-- NiFi cluster requires a multi node zookeeper setup. Zookeeper quorum is required for cluster stability and also stores cluster wide state needed for your dataflow. Zookeeper is responsible for electing a node in your cluster with the Cluster Coordinator role and Primary node role. IF a node goes down that has been assigned one of these roles, Zookeeper will elected one of the still up nodes to the role - complex to manage --> analysis will be done on X nodes instead of a single node. <-- not clear. Yes you have multiple nodes and all those nodes are producing their own set of NiFi-logs. However, if a component within your dataflow is producing bulletins (exceptions) it will report all nodes or the specific node(s) on which bulletin was produced. Cloudera offers centralized management of your NiFi cluster deployment via Cloudera Manager software. Makes deploying and managing NiFi cluster to multiple nodes easy, sets up and configures Zookeeper for you, and makes securing your NiFi easy as well by generating the needed certificates/keystores for you. Hope this helps, Matt
... View more
05-26-2023
08:40 AM
Thank You @MattWho @SAMSAL for your wonderful suggestions. I achieved solution using JolttransformJSON processor, where i used below jolt specification [{ "operation": "default", "spec": { "*": { "error": "${putdatabaserecord.error}" } } }] Thanks again for your thought provoking answers..
... View more