About DennisJaheruddi

DennisJaheruddi · ‎07-22-2021

I have seen a similar issue a while ago but that one got fixed, the best way to reach the experts to check if there is a known fix is to go to the Cloudera Support Portal, once you log in you should be able to log a support ticket.

DennisJaheruddi · ‎07-22-2021

Your expression looks fine. I just tested as follows: 1. Generateflowfile: Add property Location1, value loc1; Add property Location2, value loc2 followed by: 2. Updateattribute processor: Add property Total, value ${path: append("folder1/"):append(${Location1}):append('/'):append(${Location2})} Now the output I get is ./folder1/loc1/loc2 which is exactly what I expected. If you think there may be a problem with parsing the json inthe first place, please inspect your data in the queue directly after parsing, but the concatenation seems to work. (Unless you expected a different outcome, in which case please be very clear in what you expect and see).

DennisJaheruddi · ‎07-22-2021

There may be some corner cases where you would want to use something else, but fortunately the general answer to your question is very straightforward: In general anything you were considering Flume for, you now want to use NiFi for instead. Flume has been deprecated, so I would not recommend you to spend time and energy into developing custom content for it, rather see if NiFi solves your problem out of the box (or if needed perhaps contribute a processor to NiFi)

DennisJaheruddi · ‎07-22-2021

Are you using the version published by Cloudera? Please confirm exactly which platform version and whether this is the on premise variant or in public cloud.

DennisJaheruddi · ‎02-03-2021

Apologies to disturb this old thread, but it seems people are still landing on this via search: The discussion here is outdated, especially the exclusions around gateway nodes. Please contact your Cloudera Representative for the latest terms an conditions.

DennisJaheruddi · ‎02-01-2021

9 out of 10 times this message is caused because you run the GetHDFS on multiple nodes. Both nodes see it, perhaps even try to pick it up, but clearly not both of these can delete it. In old versions of NiFi you can fix this by setting the GetHDFS to run only on the primary node. However, that will ofcourse burden the primary node more than it should. So in recent versions (and likely yours) you will find the ListHDFS and FetchHDFS processors (and similar sets for different data sources). The lightweight List processor can then run on the primary node, and loadbalance to all nodes which will then Fetch.

DennisJaheruddi · ‎02-01-2021

What you describe does not (yet) appear to conflict with the explanation in the linked thread. It seems that NiFi attempts to load balance when needed. Perhaps try routing your incoming messages through some heavy processors to the node which receives them consumes it's resources fast and see if it starts to load balance once you are hitting the limits.

DennisJaheruddi · ‎02-01-2021

I will try to nudge you in the right direction without spoiling everything: Q1: Look into attributes, you could think of having a processor give an attribute to the flowfile when it is loaded in, this can later be used to route or name files. Q2: If it is possible to use recordbased processors and avoid splitting files into individual records...do it. It can be 100x more efficient. Q3: Nifi is great for working with individual messages, not so much for working with context (e.g. is a message a duplicate). I suppose you could do some kind of lookup of new messages against existing messages...but you should avoid this where possible. Think about something like spark/flink or even python or SQL batch solutions to detect duplicates. Q4: I don't think you will soon run into NiFi limitations here, the question is probably more what file format can take all the updates and still perform well enough.

DennisJaheruddi · ‎01-31-2021

Unfortunately most sources are in Dutch but for good measure I will explain the most important data points: 1. Total weekly new covid infections come from the RIVM (dutch official body): https://www.rivm.nl/coronavirus-covid-19/archief-corona-updates 2. For the week of 26 jan, it is mentioned in an article by the RIVM that over one third of the current new infections is of the British Variant: https://www.rivm.nl/nieuws/Britse-variant-wint-terrein-in-Nederland 3. The same article mentions the rate was 8.6% in the period of 4-10 jan These are the most important points, and already show the trend. However, here are the additional sources: 4. In 'early december' the rate was about 1% according to this article on the largest news site of the country: https://www.nu.nl/coronavirus/6101869/wat-weten-we-nu-van-de-britse-coronavariant-ja-die-is-echt-besmettelijker.html 5. In a national press conference on 12 Jan, the minister of health indicated that in the past period the rate was between 2-5%. More frequent updates confirmed that there was steady growth in this period, so for the period between 9 and 29 december, there was likely growth from about 2% to about 5%. This could be off by one or two percentage points. 6. Though there was no clear source, several news sites recently referred to a current rate of 20%, this was likely observed between 13 and 19 jan.

DennisJaheruddi · ‎01-31-2021

Recently, the weekly number of COVID-19 cases in The Netherlands has been steadily dropping week over week. However, underneath this lies a hidden positive trend of cases with the British COVID variant. In this article, I explain how I made my first visual in Cloudera Data Visualization with just a few clicks. Step 1: The Data The data is created by combining several sources that shed light on the percentage of new infections that are made up with this variant, as well as the total number of cases. As we are here for tech, more than science, I will only highlight the official report of cases per week and an official article that contains the latest percentage, as well as an earlier percentage. Additional explanation on the numbers is given in the post below this article. From here I could have chosen from many data sources, including a CSV file, but for reproducibility, I decided to upload it to Hive with a simple query: create table covid_cases as SELECT 431 as british_cases, DATE '2020-12-08' AS week_end_date UNION SELECT 1168 , DATE '2020-12-15' UNION SELECT 2470 , DATE '2020-12-22' UNION SELECT 3369 , DATE '2020-12-29' UNION SELECT 4854 , DATE '2021-01-05' UNION SELECT 5434 , DATE '2021-01-12' UNION SELECT 7755 , DATE '2021-01-19' UNION SELECT 11760 , DATE '2021-01-26' UNION SELECT 19085 , DATE '2021-02-02' People have different opinions on the best way to mock up data, but if I only need a few rows, I always like to build this kind of query with the Excel concat function. Step 2: The Connection As I used Cloudera Data Visualization within a Data Warehouse, the connection to that Data Warehouse is available out of the box. As such, I only needed to select the database and table. Step 3: The Visualization In order to minimize the effort, I decided to stick to the default settings where possible. This has the additional benefit that it is very easy to reproduce what I have done. Create a Dashboard Add a visualization for the table Select type: Bars Y-axis: british_cases (it automatically understands that we want the sum) X-axis: week_end_date (it already recognizes that it is a date) Change week_end_date in X-axis type to timestamp Labels: british_cases (it automatically understands that we want the sum) Give your X-axis and Y-axis a nice alias, and add a title and subtitle to the chart Now your chart should look just like the picture on top of this article. Hopefully, this enables everyone to gain more insight into how COVID develops in The Netherlands and of course in how to visualize data with just a few clicks. --- Edit: This additional data source indicates the cases for the week ending on 2nd Feb: Nieuwe varianten gooien roet in het eten

Online	Offline
Last Visited	‎12-15-2021 03:18 AM

Member Since	‎01-07-2019 03:54 AM
Last Visited	‎12-15-2021 03:18 AM
Posts	220
Kudos received	31

Cloudera Community

Re: 在启用kerberos的集群flink程序如何连接集群外未启用认证的kafka

Re: Attribute validation against MSSQL database

Re: Put array with Dates on nifi flowfile

Re: NiFi templates don't include all controller se...

Re: Concatenations of Multiple Attributes in Nifi

Re: PutDatabaseRecord or JsonTreeReader changing t...

Re: Concatenations of Multiple Attributes in Nifi

Re: Apache Flume in 2021

Re: Apache Nifi 1.12.1 in Kubernetes with existing...

Re: Pricing for gateway nodes

Re: Nifi GetHDFS Warning - Could not remove from H...

Re: MiNiFi to NiFi S2S load balancing does not wor...

Re: Dataflow question and special case with duplic...

Re: Visualizing the growth of the British Covid va...

Visualizing the growth of the British COVID varian...