Member since
02-01-2022
270
Posts
96
Kudos Received
59
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2192 | 06-12-2024 06:43 AM | |
3327 | 04-12-2024 06:05 AM | |
2225 | 12-07-2023 04:50 AM | |
1350 | 12-05-2023 06:22 AM | |
2275 | 11-28-2023 10:54 AM |
05-19-2023
04:05 AM
@soc88 Some suggestions so community can better help you: Show screen shots of processor's configuration tab. We need to see the properties and how you have setup the processor. For processors with errors (red boxes), you can click the box to see full error. We need to see the errors to suggest solutions. Look in nifi-app.log for these errors if you need more verbose errors than shown in the UI. You can also set the processor log level to see more in the UI. For example, set it to DEBUG and test again. I would suspect your error could be the configuration of the processor, but would more suspect opensearch permissions on receiving end and less about the version of NIFI.
... View more
05-18-2023
06:40 AM
2 Kudos
Today I finally found the time and subject matter to start an article showcasing how I am working to Operationalize NiFI Flows with Cloudera's CDP Public Cloud DataFlow Data Service. I have been so busy working with my Cloudera accounts while training up on all things CDP I just have not had a chance to write a community article. I am very excited to share my experience and this incredible modernization Cloudera has provided on top of NiFi.
Traditionally, large scale NiFi Environments and the Nifi Canvas contain all the NiFi process groups and data flows. As you can see here, this gets visually complicated:
Operating NiFi in this manner makes operations around the data flows on this NiFI canvas technically complicated. Even with user access/auth via Ranger, the root level canvas would still show the entire cluster's Process Groups. If you are an Operations owner of NiFi with hundreds of flows you did not make, it would be impossible to find flow errors, know what flows are actually doing, tune the flows for performance, version flows, or identify "noisy neighbor" flows.
The answer to these problems is the new DataFlow Experience, one of CDP Public Clouds' newest Kubernetes driven Data Services.
So what is the CDP Public Cloud DataFlow Data Service? Reference:
Cloudera DataFlow for Public Cloud (CDF-PC) is a cloud-native service that enables self-serve deployments of Apache NiFi data flows from a central catalog. DataFlow Deployments provides a cloud-native runtime to run your Apache NiFi flows through auto- scaling Kubernetes clusters, and centralized monitoring and alerting capabilities for the deployments. DataFlow Functions provides a cloud-native runtime to run your Apache NiFi flows as functions on the serverless compute services of AWS Lambda, Azure Functions, and Google Cloud Functions, targeting use cases that do not require always running NiFi flows.
What You Will Find In DataFlow
Data Catalog
Think of the Data Catalog as your cloud version of NiFi Registry. Here you can create, version, and deploy your flows. Deployed into the CDP Public Cloud control plane, the Data Catalog allows you to deploy flows to multiple environments in any cloud (AWS, Azure, or GCP).
Reference Docs: https://docs.cloudera.com/dataflow/cloud/about-managing-flow-definitions.html
Ready Flow
The Ready Flow is a gallery of pre-made, easily deployed Data Flows. These ready flows solve common use cases and are a great starting point for new Data Flow users to get used to deploying individual flows. These flows are fully parameterized so it is possible to deploy and operate these flows without touching the NiFi canvas. Some of the Ready Flows I have used recently are:
Confluent Cloud to Snowflake
Kafka To Iceberg
Kafka To Kudu Salesforce to S3
Check out all the Ready Flows
Reference Docs: https://docs.cloudera.com/dataflow/cloud/about-readyflows.html
Flow Design
The newest edition to the Data Flow family, the Flow Designer is a serverless NiFi Flow Design UI that allows you to create, test, and publish Data Flows to the Data Catalog. Very similar to NiFi the Flow Designer provides all the same capabilities but in a reduced UI with direct testing and integration around Data Flow.
Reference Docs: https://docs.cloudera.com/dataflow/cloud/about-flow-designer.html
Functions
This is one of the coolest new NiFi things to come: DataFlow Functions. Using the Data Catalog, you are able to grab a Data Flow's CRN and use that data flow as a cloud function within your cloud provider of choice. Now you are able to deploy stateless NiFi functions that live and execute on event triggers in your cloud region(s).
AWS Lambda - Docs
Azure Functions - Docs
Google Cloud Functions- Docs
Reference Docs: https://docs.cloudera.com/dataflow/cloud/functions.html
How to Move from NiFi Cluster to DataFlow
To move a NiFi data flow from legacy NiFi to Data Flow, you need a flow definition file for each Data Flow. This is the JSON version of your flow and you can import this straight into Data Catalog. If you have a newer version of NiFi, you can simply right-click on a process group and choose Download Flow Definition. If you are using an older version of NIFI, you must convert your template to a flow definition file.
Once you have imported your flow definition file, it is now time to start doing some upgrades. Deploy this flow, then navigate into NIFI UI for the deployed flow, make your revisions, export a new flow definition, upload a new flow definition to Data Catalog, deploy again, remove the previous deployment, and repeat. A few things we want to consider during flow upgrades:
Convert all sensitive properties to Parameters. Always use parameters vs variables. These are able to be provided during deployment.
Rename all processors to meaningful names. For example: "Get Customer ID" versus "EvaluateJsonPath" for a processor that would evaluate JSON for a Customer Id.
Rename any queues which you wish to track with KPIs with unique names. Any KPIs attached to the data flow need to reference unique names within the flow. For example: "CustomSuccess" versus the default "success".
To convert your template XML to flow definition file JSON, check out this lil project I created:
NiFI Template Converter: Convert NiFi templates to NiFi Flow Definition File.
Upload your template XML, and my lil web app will import to Nifi, save a flow definition file, and send it back to you in JSON. NiFi Magic!!
Not ready to modernize to DataFlow Service?
No worries, you can still operate and deploy a multi-flow NiFi environment on CDP Public Cloud DataFlow or CDP Public Cloud Flow Management Data Hub w/ NiFi.
If you are ready to get started with CDP Public Cloud DataFlow Data Service check this page here for a 60 Day Free Trial. If you need help or just want to talk to someone about how to position DataFlow against your current NiFI Use Cases DM me on Twitter or DM here to schedule a follow-up.
... View more
05-16-2023
08:57 AM
@Bhavesh_Solanki Please reference the NiFI Rest API Doc here: https://nifi.apache.org/docs/nifi-docs/rest-api/index.html You will find document blocks for Parameter Contexts & Parameter Providers with the capabilities and requirements for Parameter based API Calls. Outside of the required calls to manage parameters, you may need to complete Access steps if you are accessing the rest api from outside of nifi canvas. If you are using the API within NiFI itself, this step is not necessary.
... View more
05-15-2023
05:07 AM
@RizkyMei @deepaknay Do you all have tickets w/ support on these install issues? Are you working with your Account Team regularly? Please DM me if you are not in touch with your account team on these matters and I will see what I can do to help out.
... View more
05-15-2023
05:04 AM
1 Kudo
@quangbilly79 NO, you should not be adding gateway to every node. This gateway should only be installed on the edge/utility nodes, where you give access to external systems and users. These gateway nodes then are able to reach rest of the service(s) nodes.
... View more
05-12-2023
09:33 AM
@rafy I got the same issues when I tried to create a flow using RPATH. However, here is a solution i found to dial into the data array and match on the room = A: SELECT * FROM FLOWFILE WHERE room = 'A' I used QueryRecord With JSONTreeReader (see below) and JSONRecordSetWriter (default). NiFI Flow Definition here: @gitHub Screenshots:
... View more
05-12-2023
08:40 AM
Can you please show the source json in a : code box Also, try a manual test, without attribute, such as: SELECT *
FROM FLOWFILE
WHERE RPATH(data, '/room') = 'A' This will ensure the query is correct to the json payload. Once that works, start testing adding the attribute.
... View more
05-12-2023
06:59 AM
@rafy I would try this: SELECT * FROM FLOWFILE WHERE RPATH_STRING(data, '/room')='${ip}' Assuming ip is an attribute (${ip}) ofcourse.
... View more
05-11-2023
05:42 AM
1 Kudo
@ushasri I believe the solution here is to use the Record based processors with a specified schema. This allows you to provide the correct schema to the Reader and the Writer regardless of the field names in original excel data source.
... View more
05-10-2023
05:56 AM
@zzeng Great article. Reach out to me on internal channels. I would love to show you my oracle to kudu demo, using kafka and schema registry.
... View more