Member since
01-07-2019
220
Posts
23
Kudos Received
30
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5050 | 08-19-2021 05:45 AM | |
1811 | 08-04-2021 05:59 AM | |
879 | 07-22-2021 08:09 AM | |
3696 | 07-22-2021 08:01 AM | |
3436 | 07-22-2021 07:32 AM |
08-12-2020
02:06 PM
I am not sure if this is still relevant, but the root cause is shown as: Cannot migrate key if no previous encryption occurred I did not find much about this error, did you perhaps change your encryption settings? Or under which conditions did this problem occur?
... View more
08-12-2020
01:58 PM
The ListSFTP processor does not actually do anything with the file, it just builds a list of files that exist. Typically this would then feed into a GetSFTP processor. In the GetSFTP processor you can configure whether the original should be deleted, by default this would indeed happen. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.GetSFTP/index.html
... View more
08-12-2020
01:55 PM
I believe the root cause here is likely that NiFi has some limits in how accurately it can store numeric data types internally. If you do not want to lose precesion, the best course of action is likely to indeed use a string under the hood. I know an improvement is requested in this area to allow for greater numeric precision, but at this time I do not know the status of this. --- Alternately, if I read your question the wrong way: The solution might also be to explicitly define the column type in Hive before writing, to avoid landing on string where it is not needed.
... View more
08-12-2020
01:50 PM
Unfortunately I am not able to help with the specific content of the error, but as you mention it is urgent and impacting a production run, I would highly recommend you to create a support ticket with the relevant information. You may ultimately get an answer via the community, but for urgent matters logging a support ticket is really the recommended course of action. ---- It may of course be possible there is a problem with your flow, consider trying to roll back to an older version.
... View more
08-12-2020
01:43 PM
I am not aware of any bulk update capability through the UI. At a glance I did not see this option yet via the API, so the following may be a reasonable workaround (disclaimer: I did not try something like this myself): 1. Export the process group in which you want to update all the queues (perhaps manually update 1 connection to see what it looks like) 2. Write a script to update all the connections in the template This would still involve manual steps, but if you have a few groups with 100 queues it could save a lot of time. Also, don't hesitate to share if this indeed worked out.
... View more
07-29-2020
02:27 PM
Thanks, will think on refining the distinction between kudu and druid. Currently i would not want to include the fact that flink has state as 'storage', but regarding flink SQL, i may actually make another post later to talk about the way to interact with/access different kinds of data. (As someone also noticed, impala is also not here because it is not a store in itself, but works with stored data).
... View more
07-28-2020
11:33 AM
3 Kudos
The Cloudera Data Platform (CDP) comes with many places to store your data, and it can be challenging to know which one to use. Though there is no formal decision tree, I hereby share the key considerations from my personal perspective. They can be visualized like this: Explanation of each path a. Have large bulky files, that do not need to be queried > File and object storage The exact kind of storage to be used will mostly be defined by your environment, in a classical cluster HDFS is available. In the public cloud, each provider object store will be leveraged, and on-premises Ozone will serve as the object-store. b. Have a table, either from large bulky files, or a set of messages > Hive for scale or Kudu for interaction If you want to work with a table, and need to store it as such, it is clear you want to store your data as a table. Even if this may force you to think about how to implement the ingest in a sensible way. Kudu is great for fast insights, where hive tables (which in turn can be of different formats) can offer an unlimited scale. Note that Hive tables (registered in the Hive Metastore) can be accessed via different means, including the Hive engine and the Impala engine. c. Does your table records stream in, but you only need pre-aggregates > Druid Druid is able to aggregate data upon ingestion. d. Are you working with messages or small files > Kafka for latency or HBase for retention Kafka and Hbase are both great places to put 'many tiny things', for instance, individual transactions. Kafka offers great throughput and latency, but despite commonly used marketing messages, it is not a database and does not scale well for historical data. If you want to serve data granularly for a longer period of time, Hbase is a great fit for this. Some notes: When working in the cloud, it is often desirable to work with object stores where possible to keep costs down. The good news is that CDP Public Cloud comes with cloud native capabilities. As such several storage solutions, such as Hive, actually store the data in cloud object stores. It is possible that more than one road applies to your data. For instance, a message may require very low latency in the first few days, but also needs to be retained for several years. In such cases, it often makes sense to store a subset of the data in two places. I did not include other solutions that could store data, such as solr, or in-application state. The reason is that the primary function of these is not storage, but search and processing respectively. I also did not include Impala as it is an engine, Hive is only on this chart to represent its storage capabilities. This is a basic decision tree, it should cover most situations but do not hesitate to deviate if your situations ask for this. Also, see my related article: Find the right tool to move your data Full Disclosure & Disclaimer: I am an Employee of Cloudera, but this is not part of the formal documentation of the Cloudera Data platform. It is purely based on my own experience of advising people in their choice of tooling.
... View more
07-27-2020
11:56 AM
2 Kudos
First of all: Please consider using the latest version of the platform. For HDP 3 that is currently HDP 3.1.5 which is kept up to date with nice little things such as security patches, it also offers a path towards CDP. (The next generation given that HDP will be end of life some time next year). If there is anything holding you back from using 3.1.5 please reach out to your cloudera contact person. That being said, judging from the info on the top left the upgrade seems finished. I would mainly pay attention to the red flags (oozie) to ensure it was fully successful.
... View more
07-27-2020
11:43 AM
The important thing to keep in mind is that NiFi is built for distributed processing. As such, there is essentially one queue per node. The position in the queue is therefore unique within that node, but it is expected that each node will have a message with position 1. Sidenote, it is therefore also expected that if you set a queue size of 10000, you will end up with a queue of size 30000.
... View more
07-27-2020
11:41 AM
I have a NiFi cluster runnig, when i check the queue the positions seem duplicated, there are for instance 3 messages with position 1 and 3 with position 2. The timestamps are similar but not neccesarily the same, and the UUIDs are not duplicated. What is happening?
... View more
Labels:
- Labels:
-
Apache NiFi