Member since
04-11-2016
468
Posts
319
Kudos Received
118
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1418 | 03-09-2018 05:31 PM | |
1734 | 03-07-2018 09:45 AM | |
1744 | 03-07-2018 09:31 AM | |
2894 | 03-03-2018 01:37 PM | |
1633 | 10-17-2017 02:15 PM |
01-12-2023
09:49 AM
Apache Flink Upgrade Deployments can now upgrade from Flink 1.14 to 1.15.1. This update includes 40 bug fixes and a number of other enhancements. To learn more about what has been fixed check out the release notes . SQL Stream Builder UI The Streaming SQL Console (UI) of SQL Stream Builder has been completely reworked with new design elements. The new design provides improved user access to artifacts that are commonly used or already created as part of a project simplifying navigation and saving the user time. Software Development Lifecycle support with Projects Projects for SQL Stream Builder that improves upon the Software Developer Life Cycle needs of developers and analysts writing applications, allowing them to group together related artifacts and sync them to GitHub for versioning and CI/CD management. Before today, when users would create SQL jobs, functions, and other artifacts there was no effective way to migrate them to another environment for use (ie: from dev to prod.) The typical way that artifacts were migrated was through copy and pasting the code between environments; this lived outside of code repositories, the typical CI/CD process many companies utilize, took additional hands on keyboard time, allowed potential errors to be introduced during the copying process and when updating specific environmental configurations. These issues are solved with SQL Stream Builder Projects. Users just simply create a new project giving it a name and link a GitHub repository to it as part of creation, from this point onward any artifacts created in the project can be pushed to the GitHub Repository with a click of a button. For environment specific needs a parameterized key value configuration can be used to prevent having to edit configurations that change between deployments by referencing generic properties that are set differently between environments. Job Notifications Job notifications can help make sure that you can detect failed jobs without checking on the UI, which can save a lot of time for the user. This feature is very useful, especially when the user has numerous jobs running and keeping track of their state would be hard without notifications. Notifications can be made to send to a single user or a group of users both over email or by using a webhook. Summary In this post, we looked at some of the new features that came out in CDP Public Cloud 7.2.16. This includes Flink 1.15.1 which comes with many bug fixes, a brand new UI for SQL Stream Builder, the ability to monitor jobs for failures and send notifications and new Software Development Lifecycle capabilities with Projects. For more details read the latest release notes. Give Cloudera Streaming Analytics 7.2.16 for Datahub a try today and check out all the greatest new features added!
... View more
12-14-2022
08:22 AM
1 Kudo
You can find the release notes and the download links in the documentation. Key features for this release Rebase against NiFi 1.18 bringing the latest and greatest of Apache NiFi. It contains a ton of improvements and new features. Reset of the end of life policy: CFM 2.1.5 will be supported until August 2025 to match the CDP 7.1.7 LTS policy. It is particularly important as HDF and CFM 1.x are near to end of life. Parameter Providers: we are introducing the concept of Parameter Providers allowing users to fetch the values of parameters from external locations. In addition to a better separation of duties, it is also very useful to make CI/CD better and easier. With this release, we're supporting the following Parameter Providers: AWS Secret Manager GCP Secret Manager HashiCorp Vault Database Environment Variables External file Registry Client to connect to a DataFlow Catalog. The registry endpoint is now an extension in NiFi. It means that it is no longer limited to accessing a NiFi Registry instance. With this release we're adding an implementation allowing users to connect NiFi to their DataFlow Catalog and use it just like they would with NiFi Registry. For hybrid customers, it means that they can easily checkout and version flow definitions in the same place for both on-prem and cloud usage. It also means that on-prem customers can access the ReadyFlows gallery, assuming they have a public cloud tenant. Iceberg processor (Tech Preview). We're making available a PutIceberg processor in Technical Preview allowing users to push data into Iceberg using NiFi. This can be used in both batch and streaming (micro-batch) fashion. Snowflake ingest with Snowpipe (Tech Preview). Until now, only JDBC could be used to push data into Snowflake with NiFi. We're now making available a set of processors leveraging Snowpipe to push data into Snowflake in a more efficient way. New components: we are adding a bunch of new components... ConsumeTwitter Processors to interact with Box, Dropbox, Google Drive, SMB Processors to interact with HubSpot, Shopify, Zendesk, Workday, Airtable PutBigQuery (leveraging the new API) ListenBeats is now Cloudera supported UpdateDatabaseTable to manage updates on table's schema (add columns for example) AzureEventHubRecordSink & UDPEventRecordSink CiscoEmblemSyslogMessageReader to make it easy to ingest logs from Cisco systems such as ASA VPNs ConfluentSchemaRegistry is now Cloudera supported Iceberg and Snowflake components as mentioned before Replay last event: with this release we add the possibility to replay the last event at processor level (right-click on the processor, replay last event). This is making it super easy to replay the last flow file (instead of going to the provenance events, take the last event and click replay). This is something very useful when developing flows! And, as usual, bug fixes, security patches, performance improvements, etc.
... View more
Labels:
09-16-2021
02:40 AM
1 Kudo
With the release of CDP 7.2.11, it is now super easy to deploy your custom components on your Flow Management DataHub clusters by dropping your components in a bucket of your cloud provider. Until now, when building custom components in NiFi, you had to SSH to all of your NiFi nodes to deploy your components and make them available to use in your flow definitions. This was adding an operational overhead and was also causing issues when scaling up clusters. From now on, it's easy to configure your NiFi clusters to automatically fetch custom components from an external location in the object store of the cloud provider where NiFi is running. All of your nodes will fetch the components after you dropped them in the configured location. You can find more information in the documentation.
... View more
09-16-2021
02:31 AM
1 Kudo
With the release of CDP 7.2.11, you now have the possibility to scale up and down both your light duty and heavy duty Flow Management clusters on all cloud providers. You can find more information in the documentation.
... View more
03-09-2018
05:31 PM
1 Kudo
Hi @Jessica David, I confirm this is a bug. I created a JIRA for that: https://issues.apache.org/jira/browse/NIFI-4955 I will submit a fix in a minute. Thanks for reporting the issue. I assume you're using the header as the schema access strategy in the CSV Reader. If you're able to use a different strategy (schema name, or schema text), it should solve the problem even though you need to explicitly define the schema.
... View more
03-08-2018
08:41 AM
1 Kudo
Hi @Sami Ahmad, As stated in the processor description/documentation: This processor uses Hive Streaming to send flow file data to an Apache
Hive table. The incoming flow file is expected to be in Avro format and
the table must exist in Hive. Please see the Hive documentation for
requirements on the Hive table (format, partitions, etc.). The partition
values are extracted from the Avro record based on the names of the
partition columns as specified in the processor. NOTE: If multiple
concurrent tasks are configured for this processor, only one table can
be written to at any time by a single thread. Additional tasks intending
to write to the same table will wait for the current task to finish
writing to the table. You'll need to convert your data into avro first. The best approach is to use the record processors for that. Hope this helps.
... View more
03-08-2018
08:25 AM
1 Kudo
That's completely valid. Let's say that HTTP has the advantage of not requiring any new configuration when installing NiFi. In some environments, adding the use of an additional port can create additional work with network/security teams. Another advantage of HTTP is the possibility to use a proxy if that's required to allow communication between multiple sites. If you expect to have high load with S2S and can manage the extra conf, RAW is certainly a better option. Hope this helps.
... View more
03-07-2018
09:45 AM
1 Kudo
Hi @Chad Woodhead, This property needs to be set if and only if you are doing S2S using RAW protocol. Any reason not to use HTTP protocol for S2S? This way, you don't have anything to set and it'll work OOTB with default configuration. If, however, you want to use RAW and you are configuring NiFi using Ambari, you don't need to use config groups, you can set the property to: {{nifi_node_host}} It's a placeholder that contains the FQDN of each node. Note that this property already has a default value generated if you let it blank (see the description of this property in Ambari). Unless you have a specific network configuration, the only property you would need to specify is nifi.remote.input.socket.port to define the port that will be used for RAW S2S. Hope this helps.
... View more
03-07-2018
09:37 AM
Hi @Amira khalifa, Why don't you want to define the schema or your target tables in the schema registry? You can use InferAvroSchema to help you initializing the schema, then you can adapt it and copy/paste it into the schema registry of your choice. Using InferAvroSchema for every single flow file you'll ingest can be costly in terms of performance and it's only inferring the schema on a subset of data: as you can see, the schema might not be completely correct to take into account all the possibilities. What you could do, even though it's not what I'd recommend as a long term solution, is to use expression language to update the avro schema generated as an attribute of your flow files. Hope this helps.
... View more
03-07-2018
09:31 AM
Hi @Manikandan Jeyabal, This is not possible at the moment. The number of threads used at one point in time can be set at NiFi level. There is no way to fix the size of the threads pool per Process Group. It could be an interesting feature, feel free to raise a JIRA on https://issues.apache.org/jira/projects/NIFI Hope this helps.
... View more