1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2453 | 04-03-2024 06:39 AM | |
| 3802 | 01-12-2024 08:19 AM | |
| 2049 | 12-07-2023 01:49 PM | |
| 3032 | 08-02-2023 07:30 AM | |
| 4153 | 03-29-2023 01:22 PM |
02-22-2021
07:07 AM
1 Kudo
New Features of Apache NiFi 1.13.0
Check it out: https://twitter.com/pvillard31/status/1361569608327716867?s=27
Download today: Apache NiFi Downloads
Release Notes: Release Notes (Apache NiFi)
Migration: Migration Guidance
New Features
ListenFTP
UpdateHiveTable - Hive DDL changes - Hive Update Schema i.e. Data Drift i.e. Hive Schema Migration!!!!
SampleRecord - different sampling approaches to records (Interval Sampling, Probabilistic Sampling, Reservoir Sampling)
CDC updates
Kudu updates
AMQP and MQTT integration upgrades
ConsumeMQTT - readers, and writers added
HTTP access to NiFi by default is now configured to accept connections to 127.0.0.1/localhost only. If you want to allow broader access for some reason for HTTP, and you understand the security implications, you can still control that as always by changing the 'nifi.web.http.host' property in nifi.properties as always. That said, take the time to configure proper HTTPS. We offer detailed instructions and tooling to assist.
ConsumeMQTT - add record reader/writer
The ability to run NiFi with no GUI as MiNiFi/NiFi combined code base continues.
Support for Kudu dates
Updated GRPC versions
Apache Calcite update
PutDatabaseRecord update
Here is an example for NiFi ETL Flow:
Example NiFi 1.13.0 Flow:
ConsumeMQTT: now with readers
UpdateAttribute: set record.sink.name to kafka and recordreader.name to json.
SampleRecord: sample a few of the records
PutRecord: Use reader and destination service
UpdateHiveTable: new sink
Consume from MQTT and read and write to/from records.
Some example attributes from a running flow:
Connection pools for DatabaseRecordSinks can be JDBC, Hadoop, and Hive.
FreeFormTextRecordSetWriter is great for writing any format.
RecordSinkService, we will pick Kafka as our destination.
KafkaRecordSink from PutRecord
The reader will pick JSON in our example based on our UpdateAttribute; we can dynamically change this as data streams.
ReaderLookup - lets you pick a reader based on an attribute.
We have defined readers for Parquet, JSON, AVRO, XML, and CSV; no matter the type, I can automagically read it. Great for reusing code and great for cases like our new ListenFTP where you may get sent tons of different files to process. Use one FLOW!
RecordSinkService can help you make all our flows generic so you can drop in different sinks/destinations for your writers based on what the data coming in is. This is revolutionary for code reuse.
We can write our output in a custom format that could look like a document, HTML, fixed-width, a form letter, weird delimiter, or whatever you need.
Sample records using different methods.
We use the RecordSinkServiceLookup to allow us to change our sink location dynamically; we are passing in an attribute to choose Kafka.
We have pushed our data to Kafka using KafkaRecordSink. We can see our data easily in Streams Messaging Manager (SMM).
With a RecordReaderFactory, you can pick readers like the new WindowsEventLogReader.
As another output, we can UpdateHiveTable from our data and change the table as needed.
Straight From Release Notes: New Feature
[NIFI-7386] - AzureStorageCredentialsControllerService should also connect to storage emulator
[NIFI-7429] - Add Status History capabilities for system-level metrics
[NIFI-7549] - Adding Hazelcast based implementation for DistributedMapCacheClient
[NIFI-7624] - Build a ListenFTP processor
[NIFI-7745] - Add a SampleRecord processor
[NIFI-7796] - Add Prometheus metrics for total bytes received and bytes sent for components
[NIFI-7801] - Add acknowledgment check to Splunk
[NIFI-7821] - Create a Cassandra implementation of DistributedMapCacheClient
[NIFI-7879] - Create record path function for UUID v5
[NIFI-7906] - Add graph processor with the flexibility to query graph database conditioned on flowfile content and attributes
[NIFI-7989] - Add Hive "data drift" processor
[NIFI-8136] - Allow State Management to be tied to Process Session
[NIFI-8142] - Add "on conflict do nothing" feature to PutDatabaseRecord
[NIFI-8146] - Allow RecordPath to be used for specifying operation type and data fields when using PutDatabaseRecord
[NIFI-8175] - Add a WindowsEventLogReader
An update on Cloudera Flow Management!
Cloudera Flow Management on DataHub Public Cloud
This minor update has some Schema Registry and Atlas integration updates.
What's New in Cloudera DataFlow for Data Hub 7.2.7
Supported NiFi Processors
If that wasn't enough, a new version of MiNiFi C++ Agent!
Cloudera Edge Manager 1.2.2 Release
February 15, 2021
CEM MiNiFi C++ Agent - 1.21.01 release includes:
Support for JSON output in the Consume Windows Even Log processor
Full Expression Language support on Windows
Full S3 support (List, Fetch, Get, Put)
MiNiFi C++ download locations
MiNiFi C++ agent updates
Remember when you are done.
... View more
Labels:
02-22-2021
06:53 AM
1 Kudo
that depends on default session that Oracle JDBC driver settings. it's usually utf-8 though
... View more
02-22-2021
06:48 AM
1 Kudo
Two options. Make sure your cluster is configured for data provenance. Restart all your nodes. Make sure you are running the latest version If that doesn't work, open a support ticket with Cloudera.
... View more
02-10-2021
08:54 AM
2 Kudos
If you are using versioning and the NiFi registry, when you apply a new version to a running process group it will stop things and wait until things are not in process. https://pierrevillard.com/2018/04/09/automate-workflow-deployment-in-apache-nifi-with-the-nifi-registry/ See the Python Helper by Dan https://pypi.org/project/nipyapi/ CLI can do this, you'll have to look at the docs. https://nifi.apache.org/docs/nifi-docs/html/toolkit-guide.html Examples https://www.datainmotion.dev/2021/01/automating-starting-services-in-apache.html The upcoming Cloudera DataFlow Experience does this automatically as part of autoscaling. Make sure you use Load Balanced Queues between processors. You can also use Stateless NiFi if you want things to start/stop just complete a fixed job. https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html Extra docs https://docs.cloudera.com/cdf-datahub/7.2.6/nifi-api/topics/cdf-datahub-nifi-rest-api.html https://github.com/tspannhw/EverythingApacheNiFi https://www.datainmotion.dev/2020/09/devops-working-with-parameter-contexts.html https://www.datainmotion.dev/2020/10/automating-building-migration-backup.html https://www.datainmotion.dev/2019/04/simple-apache-nifi-operations-dashboard.html
... View more
02-09-2021
01:05 PM
That Python is for the Confluent Schema Registry. I was able to connect, but Confluent supports their own Avro schema format. This was my example and it worked: from schema_registry.client import SchemaRegistryClient, schema client = SchemaRegistryClient(url="http://myclouderasr.com:7788") print( client.get_schema('weatherny') ) This library doesn't seem to support logins or security.
... View more
02-09-2021
08:46 AM
Cloudera Schema Registry is rarely on 8081 It is usually on port 9090. https://docs.cloudera.com/csp/2.0.1/schema-registry-overview/topics/csp-examples_of_interacting_with_schema_registry.html Which version are you using? Which form factor? Public Cloud? On-Premise? https://www.datainmotion.dev/2020/10/running-flink-sql-against-kafka-using.html https://www.datainmotion.dev/2020/05/commonly-used-tcpip-ports-in-streaming.html https://www.datainmotion.dev/2020/06/using-apache-kafka-using-cloudera-data.html https://www.datainmotion.dev/2020/08/deleting-schemas-from-cloudera-schema.html Check out the swagger rest docs https://www.datainmotion.dev/2020/11/flank-smart-weather-websocket.html
... View more
02-08-2021
07:52 AM
Variables are deprecated. Parameters are new and easy to externalize. I can use them in devops processes via REST, NiFi CLI and Python. You can programatically build parameter contexts and parameters and assign them to process groups. Parameters are getting some upgrades to do some of the more advanced things you mentioned. https://www.datainmotion.dev/2020/09/devops-working-with-parameter-contexts.html And they help power stateless nifi., https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html https://www.datainmotion.dev/2021/01/automating-starting-services-in-apache.html
... View more
01-26-2021
11:40 AM
Automating Starting Services in Apache NiFi and Applying Parameters Automate all the things! You can call these commands interactively or script all of them with awesome DevOps tools. @Andre Araujo and @dchaffey can tell you more about that. Enable All NiFi Services on the Canvas By running this three times, I get any stubborn ones or ones that needed something previously running. This could be put into a loop; check the status before trying again. nifi pg-list
nifi pg-status
nifi pg-get-services The NiFi CLI has interactive help available and also some good documentation: NiFi CLI Toolkit Guide /opt/demo/nifi-toolkit-1.12.1/bin/cli.sh nifi pg-enable-services -u http://edge2ai-1.dim.local:8080 --processGroupId root
/opt/demo/nifi-toolkit-1.12.1/bin/cli.sh nifi pg-enable-services -u http://edge2ai-1.dim.local:8080 --processGroupId root
/opt/demo/nifi-toolkit-1.12.1/bin/cli.sh nifi pg-enable-services -u http://edge2ai-1.dim.local:8080 --processGroupId root We could then start a process group if we wanted: nifi pg-start -u http://edge2ai-1.dim.local:8080 -pgid 2c1860b3-7f21-36f4-a0b8-b415c652fc62 List all process groups /opt/demo/nifi-toolkit-1.12.1/bin/cli.sh nifi pg-list -u http://edge2ai-1.dim.local:8080 List Parameters /opt/demo/nifi-toolkit-1.12.1/bin/cli.sh nifi list-param-contexts -u http://edge2ai-1.dim.local:8080 -verbose Set parameters to set parameter context for a process group; you can loop to do all. pgid => parameter group id pcid => parameter context id I need to put this in a shell or Python script: /opt/demo/nifi-toolkit-1.12.1/bin/cli.sh nifi pg-set-param-context -u http://edge2ai-1.dim.local:8080 -verbose -pgid 2c1860b3-7f21-36f4-a0b8-b415c652fc62 -pcid 39f0f296-0177-1000-ffff-ffffdccb6d90 Example setupnifi.sh (Github Link) You could also use the NiFi REST API or Dan's awesome Python API NiPyApi: A Python Client SDK for Apache NiFi References DevOps: Working with Parameter Contexts NiFi Toolkit CLI No More Spaghetti Flows Report on this Apache NiFi Everything Apache Nifi Cloudera Data Platform - Using Apache NiFi REST API in the Public Cloud Using NiFi CLI to Restore NiFi Flows From Backups Automating the Building, Migration, Backup, Restore and Testing of Streaming Applications Apache NiFi Toolkit Guide An overview of Apache NiFi and Toolkit CLI deployments Automate workflow deployment in Apache NiFi with the NiFi Registry DevOps for Apache NiFi 1.7 and More
... View more
Labels:
01-13-2021
07:10 AM
Here are a few examples of moving Flume flows to NiFi. https://www.datainmotion.dev/2019/08/migrating-apache-flume-flows-to-apache.html https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache.html
... View more
01-11-2021
07:11 AM
https://www.datainmotion.dev/2020/12/simple-change-data-capture-cdc-with-sql.html https://www.datainmotion.dev/2020/07/ingesting-all-weather-data-with-apache.html https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache_15.html
... View more