1935
Posts
1198
Kudos Received
119
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1614 | 06-03-2021 07:11 AM | |
917 | 06-01-2021 10:05 AM | |
867 | 05-24-2021 11:58 AM | |
925 | 02-23-2021 07:12 AM | |
1181 | 02-22-2021 06:53 AM |
03-15-2021
02:24 PM
1 Kudo
https://dev.to/tspannhw/ingesting-all-the-weather-data-with-apache-nifi-2ho4
... View more
03-15-2021
02:20 PM
1 Kudo
I would create a schema then PutDatabaseRecord do you have an example of. the output data?
... View more
02-26-2021
06:32 AM
if you add the custom timestamp to the reader and writers it should do this automatically for all timestamp fields
... View more
02-24-2021
09:25 AM
Add a schema and record reader and writer. You can use UpdateRecord or QueryRecord to tweak formatting if needed examples https://www.datainmotion.dev/2020/11/flank-smart-weather-applications-with.html https://www.datainmotion.dev/2021/01/flank-real-time-transit-information-for.html https://www.datainmotion.dev/2021/01/flank-using-apache-kudu-as-cache-for.html https://www.datainmotion.dev/2020/12/smart-stocks-with-flank-nifi-kafka.html
... View more
02-23-2021
07:12 AM
1 Kudo
I have downloaded 10,000 processor flows. It could be your local workstation or firewall. Also for future notice, templates will be removed. Please use NiFi registry. https://www.datainmotion.dev/2019/11/nifi-toolkit-cli-for-nifi-110.html https://www.datainmotion.dev/2020/10/automating-building-migration-backup.html
... View more
02-22-2021
07:07 AM
1 Kudo
New Features of Apache NiFi 1.13.0
Check it out : https://twitter.com/pvillard31/status/1361569608327716867?s=27
Download today: Apache NiFi Downloads
Release Notes: Release Notes (Apache NiFi)
Migration: Migration Guidance
New Features
ListenFTP
UpdateHiveTable - Hive DDL changes - Hive Update Schema i.e. Data Drift i.e. Hive Schema Migration!!!!
SampleRecord - different sampling approaches to records (Interval Sampling, Probabilistic Sampling, Reservoir Sampling)
CDC updates
Kudu updates
AMQP and MQTT integration upgrades
ConsumeMQTT - readers, and writers added
HTTP access to NiFi by default is now configured to accept connections to 127.0.0.1/localhost only. If you want to allow broader access for some reason for HTTP, and you understand the security implications, you can still control that as always by changing the ' nifi.web.http.host' property in nifi.properties as always. That said, take the time to configure proper HTTPS. We offer detailed instructions and tooling to assist.
ConsumeMQTT - add record reader/writer
The ability to run NiFi with no GUI as MiNiFi/NiFi combined code base continues.
Support for Kudu dates
Updated GRPC versions
Apache Calcite update
PutDatabaseRecord update
Here is an example for NiFi ETL Flow:
Example NiFi 1.13.0 Flow:
ConsumeMQTT: now with readers
UpdateAttribute: set record.sink.name to kafka and recordreader.name to json.
SampleRecord: sample a few of the records
PutRecord: Use reader and destination service
UpdateHiveTable: new sink
Consume from MQTT and read and write to/from records.
Some example attributes from a running flow:
Connection pools for DatabaseRecordSinks can be JDBC, Hadoop, and Hive.
FreeFormTextRecordSetWriter is great for writing any format.
RecordSinkService, we will pick Kafka as our destination.
KafkaRecordSink from PutRecord
The reader will pick JSON in our example based on our UpdateAttribute; we can dynamically change this as data streams.
ReaderLookup - lets you pick a reader based on an attribute.
We have defined readers for Parquet, JSON, AVRO, XML, and CSV; no matter the type, I can automagically read it. Great for reusing code and great for cases like our new ListenFTP where you may get sent tons of different files to process. Use one FLOW!
RecordSinkService can help you make all our flows generic so you can drop in different sinks/destinations for your writers based on what the data coming in is. This is revolutionary for code reuse.
We can write our output in a custom format that could look like a document, HTML, fixed-width, a form letter, weird delimiter, or whatever you need.
Sample records using different methods.
We use the RecordSinkServiceLookup to allow us to change our sink location dynamically; we are passing in an attribute to choose Kafka.
We have pushed our data to Kafka using KafkaRecordSink. We can see our data easily in Streams Messaging Manager (SMM).
With a RecordReaderFactory, you can pick readers like the new WindowsEventLogReader.
As another output, we can UpdateHiveTable from our data and change the table as needed.
Straight From Release Notes: New Feature
[NIFI-7386] - AzureStorageCredentialsControllerService should also connect to storage emulator
[NIFI-7429] - Add Status History capabilities for system-level metrics
[NIFI-7549] - Adding Hazelcast based implementation for DistributedMapCacheClient
[NIFI-7624] - Build a ListenFTP processor
[NIFI-7745] - Add a SampleRecord processor
[NIFI-7796] - Add Prometheus metrics for total bytes received and bytes sent for components
[NIFI-7801] - Add acknowledgment check to Splunk
[NIFI-7821] - Create a Cassandra implementation of DistributedMapCacheClient
[NIFI-7879] - Create record path function for UUID v5
[NIFI-7906] - Add graph processor with the flexibility to query graph database conditioned on flowfile content and attributes
[NIFI-7989] - Add Hive "data drift" processor
[NIFI-8136] - Allow State Management to be tied to Process Session
[NIFI-8142] - Add "on conflict do nothing" feature to PutDatabaseRecord
[NIFI-8146] - Allow RecordPath to be used for specifying operation type and data fields when using PutDatabaseRecord
[NIFI-8175] - Add a WindowsEventLogReader
An update on Cloudera Flow Management!
Cloudera Flow Management on DataHub Public Cloud
This minor update has some Schema Registry and Atlas integration updates.
What's New in Cloudera DataFlow for Data Hub 7.2.7
Supported NiFi Processors
If that wasn't enough, a new version of MiNiFi C++ Agent!
Cloudera Edge Manager 1.2.2 Release
February 15, 2021
CEM MiNiFi C++ Agent - 1.21.01 release includes:
Support for JSON output in the Consume Windows Even Log processor
Full Expression Language support on Windows
Full S3 support (List, Fetch, Get, Put)
MiNiFi C++ download locations
MiNiFi C++ agent updates
Remember when you are done.
... View more
Labels:
02-22-2021
07:00 AM
most file formats can be extracted via tika https://community.cloudera.com/t5/Community-Articles/ExtractText-NiFi-Custom-Processor-Powered-by-Apache-Tika/ta-p/249392
... View more
02-22-2021
06:59 AM
Lookup the proper java regex you need https://examples.javacodegeeks.com/core-java/util/regex/list-files-with-regular-expression-filtering/ https://www.freeformatter.com/java-regex-tester.html https://regexr.com/
... View more
02-22-2021
06:57 AM
1 Kudo
that is not supported. You could fork the processors and add that if you need it.
... View more
02-22-2021
06:53 AM
1 Kudo
that depends on default session that Oracle JDBC driver settings. it's usually utf-8 though
... View more
02-22-2021
06:51 AM
Is this one node of NiFi or a cluster? remember NiFi is case sensitive and windows could have case issues on file names. you need to put a minimum file age and size on there so it doesn't try to process something before it's done being written to
... View more
02-22-2021
06:49 AM
Command path: C:\Temp\activate.bat get rid of the rest
... View more
02-22-2021
06:48 AM
1 Kudo
Two options. Make sure your cluster is configured for data provenance. Restart all your nodes. Make sure you are running the latest version If that doesn't work, open a support ticket with Cloudera.
... View more
02-22-2021
06:46 AM
Don't use Map Reduce anymore. It's time to move to Spark or Flink.
... View more
02-22-2021
06:45 AM
Spark K8 Flink Streaming Kafka Streams Kafka Connect NiFi Stateless Tez
... View more
02-22-2021
06:44 AM
1 Kudo
We don't support Mosquitto. That is an open source Eclipse project https://mosquitto.org/ You can log to topics https://mosquitto.org/man/mosquitto-conf-5.html https://www.datainmotion.dev/2019/12/iot-series-minifi-agent-on-raspberry-pi.html https://community.cloudera.com/t5/Community-Articles/MQTT-with-Apache-NiFi/ta-p/248016
... View more
02-22-2021
06:41 AM
Flume was deprecated. https://www.datainmotion.dev/2019/08/migrating-apache-flume-flows-to-apache.html
... View more
02-18-2021
08:01 AM
The name of the server must be set, it can't be localhost. NiFi and Hive are on other machines.
... View more
02-17-2021
08:47 AM
https://www.datainmotion.dev/2020/10/running-flink-sql-against-kafka-using.html https://www.datainmotion.dev/2020/08/deleting-schemas-from-cloudera-schema.html https://www.cloudera.com/tutorials/schema-registry-in-trucking-iot/3.html
... View more
02-17-2021
08:46 AM
You need to enter principal name and password.
... View more
02-17-2021
05:30 AM
Nifi doesnt create schemas. you can create them from the sr web ui or sr rest api. You could make nifi create schemas by writing some code tovhave it create schemas
... View more
02-10-2021
08:54 AM
2 Kudos
If you are using versioning and the NiFi registry, when you apply a new version to a running process group it will stop things and wait until things are not in process. https://pierrevillard.com/2018/04/09/automate-workflow-deployment-in-apache-nifi-with-the-nifi-registry/ See the Python Helper by Dan https://pypi.org/project/nipyapi/ CLI can do this, you'll have to look at the docs. https://nifi.apache.org/docs/nifi-docs/html/toolkit-guide.html Examples https://www.datainmotion.dev/2021/01/automating-starting-services-in-apache.html The upcoming Cloudera DataFlow Experience does this automatically as part of autoscaling. Make sure you use Load Balanced Queues between processors. You can also use Stateless NiFi if you want things to start/stop just complete a fixed job. https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html Extra docs https://docs.cloudera.com/cdf-datahub/7.2.6/nifi-api/topics/cdf-datahub-nifi-rest-api.html https://github.com/tspannhw/EverythingApacheNiFi https://www.datainmotion.dev/2020/09/devops-working-with-parameter-contexts.html https://www.datainmotion.dev/2020/10/automating-building-migration-backup.html https://www.datainmotion.dev/2019/04/simple-apache-nifi-operations-dashboard.html
... View more
02-09-2021
01:05 PM
That Python is for the Confluent Schema Registry. I was able to connect, but Confluent supports their own Avro schema format. This was my example and it worked: from schema_registry.client import SchemaRegistryClient, schema client = SchemaRegistryClient(url="http://myclouderasr.com:7788") print( client.get_schema('weatherny') ) This library doesn't seem to support logins or security.
... View more
02-09-2021
10:41 AM
Here is a cool NiFi websocket app https://www.datainmotion.dev/2020/12/ingesting-websocket-data-for-live-stock.html Hosting web apps in NiFi https://www.datainmotion.dev/2020/11/flank-smart-weather-websocket.html https://www.datainmotion.dev/2020/12/ingesting-websocket-data-for-live-stock.html
... View more
02-09-2021
08:46 AM
Cloudera Schema Registry is rarely on 8081 It is usually on port 9090. https://docs.cloudera.com/csp/2.0.1/schema-registry-overview/topics/csp-examples_of_interacting_with_schema_registry.html Which version are you using? Which form factor? Public Cloud? On-Premise? https://www.datainmotion.dev/2020/10/running-flink-sql-against-kafka-using.html https://www.datainmotion.dev/2020/05/commonly-used-tcpip-ports-in-streaming.html https://www.datainmotion.dev/2020/06/using-apache-kafka-using-cloudera-data.html https://www.datainmotion.dev/2020/08/deleting-schemas-from-cloudera-schema.html Check out the swagger rest docs https://www.datainmotion.dev/2020/11/flank-smart-weather-websocket.html
... View more
02-08-2021
07:52 AM
Variables are deprecated. Parameters are new and easy to externalize. I can use them in devops processes via REST, NiFi CLI and Python. You can programatically build parameter contexts and parameters and assign them to process groups. Parameters are getting some upgrades to do some of the more advanced things you mentioned. https://www.datainmotion.dev/2020/09/devops-working-with-parameter-contexts.html And they help power stateless nifi., https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html https://www.datainmotion.dev/2021/01/automating-starting-services-in-apache.html
... View more
02-08-2021
07:49 AM
1 Kudo
Grok is Grok. Find Grok expressions that work for you and use a grok tester https://stackoverflow.com/questions/38462630/logstash-grok-filter-key-value-pairs http://grokconstructor.appspot.com/do/match#result https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-record-serialization-services-nar/1.5.0/org.apache.nifi.grok.GrokReader/additionalDetails.html
... View more
01-28-2021
07:34 AM
QueryRecord is the way to go. You can compare with a sql SELECT * FROM FLOWFILE WHERE timestamp > ${event_time} and value = ${id}
... View more
01-26-2021
11:40 AM
Automating Starting Services in Apache NiFi and Applying Parameters Automate all the things! You can call these commands interactively or script all of them with awesome DevOps tools. @Andre Araujo and @dchaffey can tell you more about that. Enable All NiFi Services on the Canvas By running this three times, I get any stubborn ones or ones that needed something previously running. This could be put into a loop; check the status before trying again. nifi pg-list
nifi pg-status
nifi pg-get-services The NiFi CLI has interactive help available and also some good documentation: NiFi CLI Toolkit Guide /opt/demo/nifi-toolkit-1.12.1/bin/cli.sh nifi pg-enable-services -u http://edge2ai-1.dim.local:8080 --processGroupId root
/opt/demo/nifi-toolkit-1.12.1/bin/cli.sh nifi pg-enable-services -u http://edge2ai-1.dim.local:8080 --processGroupId root
/opt/demo/nifi-toolkit-1.12.1/bin/cli.sh nifi pg-enable-services -u http://edge2ai-1.dim.local:8080 --processGroupId root We could then start a process group if we wanted: nifi pg-start -u http://edge2ai-1.dim.local:8080 -pgid 2c1860b3-7f21-36f4-a0b8-b415c652fc62 List all process groups /opt/demo/nifi-toolkit-1.12.1/bin/cli.sh nifi pg-list -u http://edge2ai-1.dim.local:8080 List Parameters /opt/demo/nifi-toolkit-1.12.1/bin/cli.sh nifi list-param-contexts -u http://edge2ai-1.dim.local:8080 -verbose Set parameters to set parameter context for a process group; you can loop to do all. pgid => parameter group id pcid => parameter context id I need to put this in a shell or Python script: /opt/demo/nifi-toolkit-1.12.1/bin/cli.sh nifi pg-set-param-context -u http://edge2ai-1.dim.local:8080 -verbose -pgid 2c1860b3-7f21-36f4-a0b8-b415c652fc62 -pcid 39f0f296-0177-1000-ffff-ffffdccb6d90 Example setupnifi.sh (Github Link) You could also use the NiFi REST API or Dan's awesome Python API NiPyApi: A Python Client SDK for Apache NiFi References DevOps: Working with Parameter Contexts NiFi Toolkit CLI No More Spaghetti Flows Report on this Apache NiFi Everything Apache Nifi Cloudera Data Platform - Using Apache NiFi REST API in the Public Cloud Using NiFi CLI to Restore NiFi Flows From Backups Automating the Building, Migration, Backup, Restore and Testing of Streaming Applications Apache NiFi Toolkit Guide An overview of Apache NiFi and Toolkit CLI deployments Automate workflow deployment in Apache NiFi with the NiFi Registry DevOps for Apache NiFi 1.7 and More
... View more
Labels:
01-21-2021
11:44 AM
https://nipyapi.readthedocs.io/en/latest/ Can build flows with Python Code. Or you can write a custom nifi processor https://www.nifi.dev/2019/03/custom-processors.html
... View more