About DennisJaheruddi

DennisJaheruddi · ‎08-12-2020

I am not sure if this is still relevant, but the root cause is shown as: Cannot migrate key if no previous encryption occurred I did not find much about this error, did you perhaps change your encryption settings? Or under which conditions did this problem occur?

DennisJaheruddi · ‎08-12-2020

The ListSFTP processor does not actually do anything with the file, it just builds a list of files that exist. Typically this would then feed into a GetSFTP processor. In the GetSFTP processor you can configure whether the original should be deleted, by default this would indeed happen. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.GetSFTP/index.html

DennisJaheruddi · ‎08-12-2020

I believe the root cause here is likely that NiFi has some limits in how accurately it can store numeric data types internally. If you do not want to lose precesion, the best course of action is likely to indeed use a string under the hood. I know an improvement is requested in this area to allow for greater numeric precision, but at this time I do not know the status of this. --- Alternately, if I read your question the wrong way: The solution might also be to explicitly define the column type in Hive before writing, to avoid landing on string where it is not needed.

DennisJaheruddi · ‎08-12-2020

I am not aware of any bulk update capability through the UI. At a glance I did not see this option yet via the API, so the following may be a reasonable workaround (disclaimer: I did not try something like this myself): 1. Export the process group in which you want to update all the queues (perhaps manually update 1 connection to see what it looks like) 2. Write a script to update all the connections in the template This would still involve manual steps, but if you have a few groups with 100 queues it could save a lot of time. Also, don't hesitate to share if this indeed worked out.

DennisJaheruddi · ‎07-29-2020

Thanks, will think on refining the distinction between kudu and druid. Currently i would not want to include the fact that flink has state as 'storage', but regarding flink SQL, i may actually make another post later to talk about the way to interact with/access different kinds of data. (As someone also noticed, impala is also not here because it is not a store in itself, but works with stored data).

Stephbat · ‎07-28-2020

Thanks for this reply I want to upgrade to the version 3.1.4, because the download of the packages of the version 3.1.5 requires a login/passwd that I don't have. Regarding the red flag on Oozie, I think that it's due to the BUG-123169 that I will try to workaround with a upgrade of bigtop-tomcat

DennisJaheruddi · ‎07-27-2020

The important thing to keep in mind is that NiFi is built for distributed processing. As such, there is essentially one queue per node. The position in the queue is therefore unique within that node, but it is expected that each node will have a message with position 1. Sidenote, it is therefore also expected that if you set a queue size of 10000, you will end up with a queue of size 30000.

DennisJaheruddi · ‎07-25-2020

There will be multiple form factors available in the future, for now, I will assume you have an environment that contains 1 datahub with NiFi, and 1 Data Hub with both Impala and Kudu. (The answer still works if all are on the same datahub). Prerequisites Data Hubs with NiFi and Impala+Kudu Permission to access these (e.g. add a processor, create table) Know your Workload User Name (CDP Management Console > Your name (bottom left) > Profile) You should have set your Workload Password in the same location Steps to write data from NiFi to Kudu in CDP Public Cloud Unless mentioned otherwise, I have kept everything to its default settings. In Kudu Data Hub Cluster: Gather the FQDN links of the brokers and the used ports. Go to the Data Hub > Click Kudu Master > Click Masters. Combine the RPC addresses together in the following format: host1:port,host2:port,host3:port Example: ```master1fqdn.abc:7051,master2fqdn.abc:7051,master3fqdn.abc:7051``` In HUE on Impala/Kudu Data Hub Cluster: Run the following query to create the kudu table (the little triangle makes it run): `CREATE TABLE default.micro (id BIGINT, name STRING, PRIMARY KEY(id)) STORED AS KUDU;` In NiFi GUI: Ensure that you have some data in NiFi to that fits in the table. Use the `GenerateFlowFile` processor. In Properties, configure the Custom Text to contain the following (copy carefully or use shift+enter for newline): id, name 1,dennis Select the PutKudu processor, configure it as the following: - Settings - Automatically terminate relationships: Tick both success and faillure - Scheduling - 1 sec - Properties - Kudu Masters: The combined list we created earlier - Table Name: impala::default.micro - Kerberos Principal: your Workload User Name (see prerequisites above) - Kerberos Password: your Workload Password - Record Reader: Create new service>CSV Reader - Kudu Operation Type: UPSERT Right-click the Nifi Canvas> Configure > The little cogwheel bolt of CSV Reader, set the following property and then apply: Treat First Line as Header: True Click the little lightning bolt of CSV Reader > Enable On the canvas connect your GenerateFlowFile processor to your PutKudu processor and start the flow. You should now be able to select your table through HUE and see that a single record has been added: `select * from default.micro` These are the minimal steps, a more extensive explanation can be found on in the Cloudera Documentation. Potential refinements A describe extended also exposes the hostnames. For me, this also worked, but I am unsure how safe it is not to explicitly define the ports.

DennisJaheruddi · ‎07-23-2020

The Cloudera Data Platform (CDP) comes with a wide variety of tools that move data, these are the same in any cloud as well as on-premises. Though there is no formal decision tree, I will summarize the key considerations from my personal perspective. In short, it can be visualized like this: Steps for finding the right tool to move your data Staying within Hive and SQL queries suffice? > Hive otherwise No complex operations (e.g. joins) > Nifi otherwise Batch > Spark otherwise Already have Kafka Streams/Spark Streaming in use? > Kakfa Streams/Spark Streaming otherwise Flink Some notes: If you can use Nifi or a more complex solution, use Nifi Use Flink as your streaming engine, unless there is a good reason not to. It is the latest generation of streaming engines. Currently, I do not recommend using Flink for batch processing yet, but that will likely soon change I did not include tools like Impala, Hbase/Phoenix, Druid as their main purpose is accessing data This is a basic decision tree, it should cover most situations but do not hesitate to deviate if your situations ask for this Also see my related article: Choose the right place to store your data Full Disclosure & Disclaimer: I am an Employee of Cloudera, but this is not part of the formal documentation of the Cloudera Data platform. It is purely based on my own experience of advising people in their choice of tooling.

stevenmatison · ‎06-29-2020

Too cool man, great work!

Online	Offline
Last Visited	‎12-15-2021 03:18 AM

Member Since	‎01-07-2019 03:54 AM
Last Visited	‎12-15-2021 03:18 AM
Posts	220
Kudos received	31

Cloudera Community

Re: 在启用kerberos的集群flink程序如何连接集群外未启用认证的kafka

Re: Attribute validation against MSSQL database

Re: Put array with Dates on nifi flowfile

Re: NiFi templates don't include all controller se...

Re: Concatenations of Multiple Attributes in Nifi

Re: Apache Nifi dont start

Re: nifi sftp

Re: Nifi Scientific conversion

Re: NiFi flow file expiry

Re: How to choose the right place to store data wi...

Re: Upgrading HDP 3.1.0 to 3.1.4 : no button to cl...

Re: Nifi queue position duplicated, position not u...

How to write data from NiFi to Kudu in CDP Public ...

How to choose the right tool to move data with the...

Re: Building worlds first game...in NiFi!