About DennisJaheruddi

DennisJaheruddi · ‎08-12-2020

I am not sure if this is still relevant, but the root cause is shown as: Cannot migrate key if no previous encryption occurred I did not find much about this error, did you perhaps change your encryption settings? Or under which conditions did this problem occur?

DennisJaheruddi · ‎08-12-2020

The ListSFTP processor does not actually do anything with the file, it just builds a list of files that exist. Typically this would then feed into a GetSFTP processor. In the GetSFTP processor you can configure whether the original should be deleted, by default this would indeed happen. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.GetSFTP/index.html

DennisJaheruddi · ‎08-12-2020

I believe the root cause here is likely that NiFi has some limits in how accurately it can store numeric data types internally. If you do not want to lose precesion, the best course of action is likely to indeed use a string under the hood. I know an improvement is requested in this area to allow for greater numeric precision, but at this time I do not know the status of this. --- Alternately, if I read your question the wrong way: The solution might also be to explicitly define the column type in Hive before writing, to avoid landing on string where it is not needed.

DennisJaheruddi · ‎08-12-2020

Unfortunately I am not able to help with the specific content of the error, but as you mention it is urgent and impacting a production run, I would highly recommend you to create a support ticket with the relevant information. You may ultimately get an answer via the community, but for urgent matters logging a support ticket is really the recommended course of action. ---- It may of course be possible there is a problem with your flow, consider trying to roll back to an older version.

DennisJaheruddi · ‎08-12-2020

I am not aware of any bulk update capability through the UI. At a glance I did not see this option yet via the API, so the following may be a reasonable workaround (disclaimer: I did not try something like this myself): 1. Export the process group in which you want to update all the queues (perhaps manually update 1 connection to see what it looks like) 2. Write a script to update all the connections in the template This would still involve manual steps, but if you have a few groups with 100 queues it could save a lot of time. Also, don't hesitate to share if this indeed worked out.

DennisJaheruddi · ‎07-29-2020

Thanks, will think on refining the distinction between kudu and druid. Currently i would not want to include the fact that flink has state as 'storage', but regarding flink SQL, i may actually make another post later to talk about the way to interact with/access different kinds of data. (As someone also noticed, impala is also not here because it is not a store in itself, but works with stored data).

DennisJaheruddi · ‎07-28-2020

The Cloudera Data Platform (CDP) comes with many places to store your data, and it can be challenging to know which one to use. Though there is no formal decision tree, I hereby share the key considerations from my personal perspective. They can be visualized like this: Explanation of each path a. Have large bulky files, that do not need to be queried > File and object storage The exact kind of storage to be used will mostly be defined by your environment, in a classical cluster HDFS is available. In the public cloud, each provider object store will be leveraged, and on-premises Ozone will serve as the object-store. b. Have a table, either from large bulky files, or a set of messages > Hive for scale or Kudu for interaction If you want to work with a table, and need to store it as such, it is clear you want to store your data as a table. Even if this may force you to think about how to implement the ingest in a sensible way. Kudu is great for fast insights, where hive tables (which in turn can be of different formats) can offer an unlimited scale. Note that Hive tables (registered in the Hive Metastore) can be accessed via different means, including the Hive engine and the Impala engine. c. Does your table records stream in, but you only need pre-aggregates > Druid Druid is able to aggregate data upon ingestion. d. Are you working with messages or small files > Kafka for latency or HBase for retention Kafka and Hbase are both great places to put 'many tiny things', for instance, individual transactions. Kafka offers great throughput and latency, but despite commonly used marketing messages, it is not a database and does not scale well for historical data. If you want to serve data granularly for a longer period of time, Hbase is a great fit for this. Some notes: When working in the cloud, it is often desirable to work with object stores where possible to keep costs down. The good news is that CDP Public Cloud comes with cloud native capabilities. As such several storage solutions, such as Hive, actually store the data in cloud object stores. It is possible that more than one road applies to your data. For instance, a message may require very low latency in the first few days, but also needs to be retained for several years. In such cases, it often makes sense to store a subset of the data in two places. I did not include other solutions that could store data, such as solr, or in-application state. The reason is that the primary function of these is not storage, but search and processing respectively. I also did not include Impala as it is an engine, Hive is only on this chart to represent its storage capabilities. This is a basic decision tree, it should cover most situations but do not hesitate to deviate if your situations ask for this. Also, see my related article: Find the right tool to move your data Full Disclosure & Disclaimer: I am an Employee of Cloudera, but this is not part of the formal documentation of the Cloudera Data platform. It is purely based on my own experience of advising people in their choice of tooling.

DennisJaheruddi · ‎07-27-2020

First of all: Please consider using the latest version of the platform. For HDP 3 that is currently HDP 3.1.5 which is kept up to date with nice little things such as security patches, it also offers a path towards CDP. (The next generation given that HDP will be end of life some time next year). If there is anything holding you back from using 3.1.5 please reach out to your cloudera contact person. That being said, judging from the info on the top left the upgrade seems finished. I would mainly pay attention to the red flags (oozie) to ensure it was fully successful.

DennisJaheruddi · ‎07-27-2020

The important thing to keep in mind is that NiFi is built for distributed processing. As such, there is essentially one queue per node. The position in the queue is therefore unique within that node, but it is expected that each node will have a message with position 1. Sidenote, it is therefore also expected that if you set a queue size of 10000, you will end up with a queue of size 30000.

DennisJaheruddi · ‎07-27-2020

I have a NiFi cluster runnig, when i check the queue the positions seem duplicated, there are for instance 3 messages with position 1 and 3 with position 2. The timestamps are similar but not neccesarily the same, and the UUIDs are not duplicated. What is happening?

Online	Offline
Last Visited	‎12-15-2021 03:18 AM

Member Since	‎01-07-2019 03:54 AM
Last Visited	‎12-15-2021 03:18 AM
Posts	220
Kudos received	31

Cloudera Community

Re: 在启用kerberos的集群flink程序如何连接集群外未启用认证的kafka

Re: Attribute validation against MSSQL database

Re: Put array with Dates on nifi flowfile

Re: NiFi templates don't include all controller se...

Re: Concatenations of Multiple Attributes in Nifi

Re: Apache Nifi dont start

Re: nifi sftp

Re: Nifi Scientific conversion

Re: NiFi shutting down abruptly

Re: NiFi flow file expiry

Re: How to choose the right place to store data wi...

Choosing the right place to store data within the ...

Re: Upgrading HDP 3.1.0 to 3.1.4 : no button to cl...

Re: Nifi queue position duplicated, position not u...

Nifi queue position duplicated, position not uniqu...