Member since
04-11-2016
468
Posts
319
Kudos Received
118
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1174 | 03-09-2018 05:31 PM | |
1440 | 03-07-2018 09:45 AM | |
1456 | 03-07-2018 09:31 AM | |
2522 | 03-03-2018 01:37 PM | |
1330 | 10-17-2017 02:15 PM |
01-12-2023
09:49 AM
Apache Flink Upgrade Deployments can now upgrade from Flink 1.14 to 1.15.1. This update includes 40 bug fixes and a number of other enhancements. To learn more about what has been fixed check out the release notes . SQL Stream Builder UI The Streaming SQL Console (UI) of SQL Stream Builder has been completely reworked with new design elements. The new design provides improved user access to artifacts that are commonly used or already created as part of a project simplifying navigation and saving the user time. Software Development Lifecycle support with Projects Projects for SQL Stream Builder that improves upon the Software Developer Life Cycle needs of developers and analysts writing applications, allowing them to group together related artifacts and sync them to GitHub for versioning and CI/CD management. Before today, when users would create SQL jobs, functions, and other artifacts there was no effective way to migrate them to another environment for use (ie: from dev to prod.) The typical way that artifacts were migrated was through copy and pasting the code between environments; this lived outside of code repositories, the typical CI/CD process many companies utilize, took additional hands on keyboard time, allowed potential errors to be introduced during the copying process and when updating specific environmental configurations. These issues are solved with SQL Stream Builder Projects. Users just simply create a new project giving it a name and link a GitHub repository to it as part of creation, from this point onward any artifacts created in the project can be pushed to the GitHub Repository with a click of a button. For environment specific needs a parameterized key value configuration can be used to prevent having to edit configurations that change between deployments by referencing generic properties that are set differently between environments. Job Notifications Job notifications can help make sure that you can detect failed jobs without checking on the UI, which can save a lot of time for the user. This feature is very useful, especially when the user has numerous jobs running and keeping track of their state would be hard without notifications. Notifications can be made to send to a single user or a group of users both over email or by using a webhook. Summary In this post, we looked at some of the new features that came out in CDP Public Cloud 7.2.16. This includes Flink 1.15.1 which comes with many bug fixes, a brand new UI for SQL Stream Builder, the ability to monitor jobs for failures and send notifications and new Software Development Lifecycle capabilities with Projects. For more details read the latest release notes. Give Cloudera Streaming Analytics 7.2.16 for Datahub a try today and check out all the greatest new features added!
... View more
12-14-2022
08:22 AM
1 Kudo
You can find the release notes and the download links in the documentation. Key features for this release Rebase against NiFi 1.18 bringing the latest and greatest of Apache NiFi. It contains a ton of improvements and new features. Reset of the end of life policy: CFM 2.1.5 will be supported until August 2025 to match the CDP 7.1.7 LTS policy. It is particularly important as HDF and CFM 1.x are near to end of life. Parameter Providers: we are introducing the concept of Parameter Providers allowing users to fetch the values of parameters from external locations. In addition to a better separation of duties, it is also very useful to make CI/CD better and easier. With this release, we're supporting the following Parameter Providers: AWS Secret Manager GCP Secret Manager HashiCorp Vault Database Environment Variables External file Registry Client to connect to a DataFlow Catalog. The registry endpoint is now an extension in NiFi. It means that it is no longer limited to accessing a NiFi Registry instance. With this release we're adding an implementation allowing users to connect NiFi to their DataFlow Catalog and use it just like they would with NiFi Registry. For hybrid customers, it means that they can easily checkout and version flow definitions in the same place for both on-prem and cloud usage. It also means that on-prem customers can access the ReadyFlows gallery, assuming they have a public cloud tenant. Iceberg processor (Tech Preview). We're making available a PutIceberg processor in Technical Preview allowing users to push data into Iceberg using NiFi. This can be used in both batch and streaming (micro-batch) fashion. Snowflake ingest with Snowpipe (Tech Preview). Until now, only JDBC could be used to push data into Snowflake with NiFi. We're now making available a set of processors leveraging Snowpipe to push data into Snowflake in a more efficient way. New components: we are adding a bunch of new components... ConsumeTwitter Processors to interact with Box, Dropbox, Google Drive, SMB Processors to interact with HubSpot, Shopify, Zendesk, Workday, Airtable PutBigQuery (leveraging the new API) ListenBeats is now Cloudera supported UpdateDatabaseTable to manage updates on table's schema (add columns for example) AzureEventHubRecordSink & UDPEventRecordSink CiscoEmblemSyslogMessageReader to make it easy to ingest logs from Cisco systems such as ASA VPNs ConfluentSchemaRegistry is now Cloudera supported Iceberg and Snowflake components as mentioned before Replay last event: with this release we add the possibility to replay the last event at processor level (right-click on the processor, replay last event). This is making it super easy to replay the last flow file (instead of going to the provenance events, take the last event and click replay). This is something very useful when developing flows! And, as usual, bug fixes, security patches, performance improvements, etc.
... View more
Labels:
09-16-2021
02:40 AM
1 Kudo
With the release of CDP 7.2.11, it is now super easy to deploy your custom components on your Flow Management DataHub clusters by dropping your components in a bucket of your cloud provider. Until now, when building custom components in NiFi, you had to SSH to all of your NiFi nodes to deploy your components and make them available to use in your flow definitions. This was adding an operational overhead and was also causing issues when scaling up clusters. From now on, it's easy to configure your NiFi clusters to automatically fetch custom components from an external location in the object store of the cloud provider where NiFi is running. All of your nodes will fetch the components after you dropped them in the configured location. You can find more information in the documentation.
... View more
09-16-2021
02:31 AM
1 Kudo
With the release of CDP 7.2.11, you now have the possibility to scale up and down both your light duty and heavy duty Flow Management clusters on all cloud providers. You can find more information in the documentation.
... View more
11-25-2018
11:17 AM
Not sure to understand the process you're following. Are you using the NiFi Registry? If yes, then when deploying a registered flow, it'll use the variables as defined when you committed the version in the Registry. After the import, in the target NiFi, you'd have to update the variables with the values of the target environment. Makes sense?
... View more
11-23-2018
03:42 PM
Hey. Are you sure you have deployed the correct version of your Groovy script on the NiFi node(s)? It looks like the Groovy is not correct. (I checked the one you provided and looks good to me though)
... View more
09-01-2018
04:46 PM
Hi @A
C
, Depending on the query you're executing, it might not result in an actual job to be running on YARN. It depends of your parameters around "fetch tasks" in Hive. Regarding supervisors, it's related to the Storm service, it's safe to ignore it. HTH
... View more
05-24-2018
08:28 AM
1 Kudo
Hi @Jérôme ROUSSEAU, A new version of the connector (1.5.4) should be available for download on the website very soon. Pierre
... View more
04-18-2018
09:42 AM
Hi @Chad Woodhead, Thanks for reporting the issue, I reproduced it and created NIFI-5092 to track it: https://issues.apache.org/jira/browse/NIFI-5092 Current workaround: after a NiFi restart, stop the reporting task, clear the state of the reporting task and start the reporting task. A fix will be provided very quickly and if needed you would be able to build the NAR containing the reporting task and drop it in NiFi to use the fixed version. Hope this helps, Pierre
... View more
03-16-2018
03:35 PM
Hi @Amira khalifa, You might want to have a look at ValidateRecord processor or ValidateCSV processor. Hope this helps.
... View more
03-12-2018
08:23 PM
1 Kudo
Hi @Haitam Dadsi, That something you could easily achieve using Apache NiFi. You could have a simple workflow looking like: GenerateTableFetch -> ExecuteSQL -> PublishKafkaRecord You will find plenty of examples for such processors. Hope this helps.
... View more
03-11-2018
09:58 AM
What you could do is to play with the combination of backpressure parameters and scheduling frequency for the first processor after the reception of the data. Or just use the ControlRate processor. When backpressure is enabled between the RPG and ControlRate, the RPG will stop pulling the data from the remote system.
... View more
03-10-2018
10:06 AM
Hi @Gireesh Kumar Gopinathan, I think you might be interested by this article: https://community.hortonworks.com/articles/109629/how-to-achieve-better-load-balancing-using-nifis-s.html Hope this helps.
... View more
03-09-2018
05:31 PM
1 Kudo
Hi @Jessica David, I confirm this is a bug. I created a JIRA for that: https://issues.apache.org/jira/browse/NIFI-4955 I will submit a fix in a minute. Thanks for reporting the issue. I assume you're using the header as the schema access strategy in the CSV Reader. If you're able to use a different strategy (schema name, or schema text), it should solve the problem even though you need to explicitly define the schema.
... View more
03-08-2018
08:55 AM
Hi @Saeed Barghi, To enable authentication/authorization on NiFi, the first requirement is to enable SSL. You will need to generate certificates for your nodes, create appropriate keystores/truststores and configure NiFi accordingly. One NiFi is secured, you have multiple options for authentication: individual SSL certificates, Kerberos, LDAP. In your case you would want to issue a certificate for each of your user and they will use it into their browser to authenticate against NiFi. Then you can define the appropriate authorizations to ensure multi-tenancy. This post might give you a better understanding of what needs to be done: https://pierrevillard.com/2016/11/29/apache-nifi-1-1-0-secured-cluster-setup/ If you're installing HDF with Ambari, then a lot of this can be automatically done for you using the internal CA provided with NiFi toolkit. Hope this helps.
... View more
03-08-2018
08:41 AM
1 Kudo
Hi @Sami Ahmad, As stated in the processor description/documentation: This processor uses Hive Streaming to send flow file data to an Apache
Hive table. The incoming flow file is expected to be in Avro format and
the table must exist in Hive. Please see the Hive documentation for
requirements on the Hive table (format, partitions, etc.). The partition
values are extracted from the Avro record based on the names of the
partition columns as specified in the processor. NOTE: If multiple
concurrent tasks are configured for this processor, only one table can
be written to at any time by a single thread. Additional tasks intending
to write to the same table will wait for the current task to finish
writing to the table. You'll need to convert your data into avro first. The best approach is to use the record processors for that. Hope this helps.
... View more
03-08-2018
08:38 AM
Could you share the full content of nifi.properties file? (remove any sensitive value) Did you configure something regarding authentication/authorization?
... View more
03-08-2018
08:25 AM
1 Kudo
That's completely valid. Let's say that HTTP has the advantage of not requiring any new configuration when installing NiFi. In some environments, adding the use of an additional port can create additional work with network/security teams. Another advantage of HTTP is the possibility to use a proxy if that's required to allow communication between multiple sites. If you expect to have high load with S2S and can manage the extra conf, RAW is certainly a better option. Hope this helps.
... View more
03-07-2018
09:51 AM
1 Kudo
Hi @User 805, You could use the InferAvroSchema processor to infer the schema from your JSON data and put this schema as an attribute of the flow files, then you can define your JSON Reader controller service to use this attribute as the schema source. Note that not defining in a schema registry the schema of each source you have might source of errors: if some of your JSONs are missing some fields, you will miss columns in the generated CSV. Also infering the schema for every single flow file can be costly in terms of performance. What I'd recommend is to use the InferAvroSchema processor to help you generating the schema for each one of your sources and then put the schemas in the schema registry of your choice and use it. I'll be more performant and you can leverage the features of the schema registry such as schema versioning, backward/forward compatibility, reuse of the schemas over multiple components, etc. Hope this helps.
... View more
03-07-2018
09:45 AM
1 Kudo
Hi @Chad Woodhead, This property needs to be set if and only if you are doing S2S using RAW protocol. Any reason not to use HTTP protocol for S2S? This way, you don't have anything to set and it'll work OOTB with default configuration. If, however, you want to use RAW and you are configuring NiFi using Ambari, you don't need to use config groups, you can set the property to: {{nifi_node_host}} It's a placeholder that contains the FQDN of each node. Note that this property already has a default value generated if you let it blank (see the description of this property in Ambari). Unless you have a specific network configuration, the only property you would need to specify is nifi.remote.input.socket.port to define the port that will be used for RAW S2S. Hope this helps.
... View more
03-07-2018
09:37 AM
Hi @Amira khalifa, Why don't you want to define the schema or your target tables in the schema registry? You can use InferAvroSchema to help you initializing the schema, then you can adapt it and copy/paste it into the schema registry of your choice. Using InferAvroSchema for every single flow file you'll ingest can be costly in terms of performance and it's only inferring the schema on a subset of data: as you can see, the schema might not be completely correct to take into account all the possibilities. What you could do, even though it's not what I'd recommend as a long term solution, is to use expression language to update the avro schema generated as an attribute of your flow files. Hope this helps.
... View more
03-07-2018
09:31 AM
Hi @Manikandan Jeyabal, This is not possible at the moment. The number of threads used at one point in time can be set at NiFi level. There is no way to fix the size of the threads pool per Process Group. It could be an interesting feature, feel free to raise a JIRA on https://issues.apache.org/jira/projects/NIFI Hope this helps.
... View more
03-03-2018
01:46 PM
OK, that's weird because the correct version is displayed as current on your hosts. You could try the following: ambari-server set-current --cluster=hdplid4 --version-display-name=HDP-2.6.4.0 If that does not help, I'd try looking at the 'host component state' table in Ambari database.
... View more
03-03-2018
01:37 PM
Hi, What are the RUNNING jobs that stop making progress? In particular are these jobs oozie-launcher jobs? You can check that by looking at the name of the jobs in the Scheduler view of the RM UI. Oozie has a non-intuitive way of launching jobs due to legacy behaviors from Hadoop 1 (note that this will be fixed with Oozie 5.x). In short, an Oozie action will launch an oozie-launcher job (1 AM container + 1 mapper) that will be responsible for actually launching the job that you defined in your Oozie action. In the end, your Oozie action will actually require two jobs from YARN point of view. When running multiple workflows at the same time, you could end with a lot of oozie-launcher jobs filling up the queue capacity and preventing the actual jobs to be launched (they will remain in ACCEPTED state). A common practice is to have a dedicated queue for oozie-launcher jobs created by Oozies workflows. This way you prevent this kind of deadlock situation. IIRC, you can set the queue for oozie-launcher jobs using oozie.launcher.mapred.job.queue.name. Hope this helps.
... View more
03-03-2018
10:18 AM
Hi Carlos, It sounds like a display bug. Did you try a force refresh in your browser? To be honest, I believe that the correct version will be installed if you add Druid because it'll rely on the repository files deployed when you installed the new version. Pierre
... View more
03-03-2018
10:16 AM
1 Kudo
Hi Jay, You need to provide a file such as https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/test/resources/TestExtractGrok/patterns You could also use the ConvertRecord processor with a GrokReader. In this case there is already a default pattern file pre-loaded with the reader. Hope this helps, Pierre
... View more
12-22-2017
03:54 PM
Or... since you're not using a LDAP, you could directly use the File Authorizer instead of the managed one: <authorizer>
<identifier>file-provider</identifier>
<class>org.apache.nifi.authorization.FileAuthorizer</class>
<property name="Authorizations File">./conf/authorizations.xml</property>
<property name="Users File">./conf/users.xml</property>
<property name="Initial Admin Identity"></property>
<property name="Legacy Authorized Users File"></property>
<property name="Node Identity 1"></property>
</authorizer> And then just reference this idenfitier in nifi.properties file: nifi.security.user.authorizer=file-provider
... View more
12-22-2017
03:46 PM
Hi @Zeeshan Cornelius, To have something configurable from NiFi UI (allowing you to manage users/groups from the Users view), I believe you'd need to go through the definition of a Composite Configurable User Group provider. Your authorizers.xml file should look like: <authorizers>
<userGroupProvider>
<identifier>file-user-group-provider</identifier>
<class>org.apache.nifi.authorization.FileUserGroupProvider</class>
<property name="Users File">./conf/users.xml</property>
<property name="Legacy Authorized Users File"></property>
<property name="Initial User Identity 1">admin</property>
</userGroupProvider>
<userGroupProvider>
<identifier>composite-configurable-user-group-provider</identifier>
<class>org.apache.nifi.authorization.CompositeConfigurableUserGroupProvider</class>
<property name="Configurable User Group Provider">file-user-group-provider</property>
</userGroupProvider>
<accessPolicyProvider>
<identifier>file-access-policy-provider</identifier>
<class>org.apache.nifi.authorization.FileAccessPolicyProvider</class>
<property name="User Group Provider">composite-configurable-user-group-provider</property>
<property name="Authorizations File">./conf/authorizations.xml</property>
<property name="Initial Admin Identity">admin</property>
<property name="Legacy Authorized Users File"></property>
</accessPolicyProvider>
<authorizer>
<identifier>managed-authorizer</identifier>
<class>org.apache.nifi.authorization.StandardManagedAuthorizer</class>
<property name="Access Policy Provider">file-access-policy-provider</property>
</authorizer>
</authorizers> Let me know if this helps, Pierre.
... View more
10-30-2017
11:01 AM
In case someone faces the same issue: in my case, I solved it by ensuring that 'atlas' user is known in Ranger.
... View more
10-17-2017
02:15 PM
1 Kudo
Hi @msumbul, The ConvertJsonToSQL processor has been developed to work with the SQL processors. The Hive processor is expecting a different naming convention on the attributes, instead of: sql.args.N.type/value it is expecting: hiveql.args.N.type/value The syntax between SQL and HQL are close enough and the generated query is also valid for Hive but the attributes need to be renamed. What you can do is to use an UpdateAttribute processor to have the expected attributes before using the PutHiveQL processor. Hope this helps.
... View more