About DennisJaheruddi

DennisJaheruddi · ‎02-03-2021

Apologies to disturb this old thread, but it seems people are still landing on this via search: The discussion here is outdated, especially the exclusions around gateway nodes. Please contact your Cloudera Representative for the latest terms an conditions.

DennisJaheruddi · ‎02-01-2021

I am not sure if I understand fully what you are trying to achieve, but here are some solution directions that come to mind: 1. Use regex to match the attribute name, perhaps with anyMatchingAttribute https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#anymatchingattribute 2. Cast the attribute name to lower, this may indeed require a script.

DennisJaheruddi · ‎02-01-2021

This can likely not be resolved without much more detail, so it may be good to log a support ticket for this. The first thing that comes to mind: Are you able to use the rest API at all, does your request reach the node or is there possibly a problem in getting the request there (Firewall, hostname, typo).

DennisJaheruddi · ‎02-01-2021

9 out of 10 times this message is caused because you run the GetHDFS on multiple nodes. Both nodes see it, perhaps even try to pick it up, but clearly not both of these can delete it. In old versions of NiFi you can fix this by setting the GetHDFS to run only on the primary node. However, that will ofcourse burden the primary node more than it should. So in recent versions (and likely yours) you will find the ListHDFS and FetchHDFS processors (and similar sets for different data sources). The lightweight List processor can then run on the primary node, and loadbalance to all nodes which will then Fetch.

DennisJaheruddi · ‎02-01-2021

As you can see the Mergecontent processor has several parameters for ensuring messages with a commonality get merged. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.MergeContent/index.html This could be used in the way that you let each of the two processors give the messages an attribute, and only merge things with the same attribute. You appear to be looking for the opposite. I am not aware of any such parameter, and am also not sure if this is in the spirit of what mergecontent was designed for. Perhaps it is time to step back and see if mergecontent (or even NiFi) is the best tool for this specific job.

DennisJaheruddi · ‎02-01-2021

What you describe does not (yet) appear to conflict with the explanation in the linked thread. It seems that NiFi attempts to load balance when needed. Perhaps try routing your incoming messages through some heavy processors to the node which receives them consumes it's resources fast and see if it starts to load balance once you are hitting the limits.

DennisJaheruddi · ‎02-01-2021

I will try to nudge you in the right direction without spoiling everything: Q1: Look into attributes, you could think of having a processor give an attribute to the flowfile when it is loaded in, this can later be used to route or name files. Q2: If it is possible to use recordbased processors and avoid splitting files into individual records...do it. It can be 100x more efficient. Q3: Nifi is great for working with individual messages, not so much for working with context (e.g. is a message a duplicate). I suppose you could do some kind of lookup of new messages against existing messages...but you should avoid this where possible. Think about something like spark/flink or even python or SQL batch solutions to detect duplicates. Q4: I don't think you will soon run into NiFi limitations here, the question is probably more what file format can take all the updates and still perform well enough.

DennisJaheruddi · ‎02-01-2021

The first thing I would recommend, is attempting to avoid a custom processor. If you need to add columns to a record, the UpdateRecord processor should enable this. (At first I expected the QueryRecord processor to do it as well, but am unable to find any references to this). https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.UpdateRecord/index.html https://community.cloudera.com/t5/Support-Questions/Adding-columns-to-sql-in-nifi/td-p/224598

DennisJaheruddi · ‎02-01-2021

The first thing to check is if your version of the platform supports your version of OpenJDK. For instance, if you are on HDF 3.5 or below, then OpenJDK does not appear to be supported: https://supportmatrix.hortonworks.com/ However, if you are on the latest Flow Management version in CDP, then OpenJDK 11.0.8 and above should be fine, as per https://docs.cloudera.com/cfm/2.0.4/support-matrix/topics/cfm-system-requirements.html If you confirmed that you are within the right combination of platform and JDK version and the problem still persists I would recommend to log a support ticket (I did not find other occurances of this error).

DennisJaheruddi · ‎01-31-2021

Unfortunately most sources are in Dutch but for good measure I will explain the most important data points: 1. Total weekly new covid infections come from the RIVM (dutch official body): https://www.rivm.nl/coronavirus-covid-19/archief-corona-updates 2. For the week of 26 jan, it is mentioned in an article by the RIVM that over one third of the current new infections is of the British Variant: https://www.rivm.nl/nieuws/Britse-variant-wint-terrein-in-Nederland 3. The same article mentions the rate was 8.6% in the period of 4-10 jan These are the most important points, and already show the trend. However, here are the additional sources: 4. In 'early december' the rate was about 1% according to this article on the largest news site of the country: https://www.nu.nl/coronavirus/6101869/wat-weten-we-nu-van-de-britse-coronavariant-ja-die-is-echt-besmettelijker.html 5. In a national press conference on 12 Jan, the minister of health indicated that in the past period the rate was between 2-5%. More frequent updates confirmed that there was steady growth in this period, so for the period between 9 and 29 december, there was likely growth from about 2% to about 5%. This could be off by one or two percentage points. 6. Though there was no clear source, several news sites recently referred to a current rate of 20%, this was likely observed between 13 and 19 jan.

Online	Offline
Last Visited	‎12-15-2021 03:18 AM

Member Since	‎01-07-2019 03:54 AM
Last Visited	‎12-15-2021 03:18 AM
Posts	220
Kudos received	31

Cloudera Community

Re: 在启用kerberos的集群flink程序如何连接集群外未启用认证的kafka

Re: Attribute validation against MSSQL database

Re: Put array with Dates on nifi flowfile

Re: NiFi templates don't include all controller se...

Re: Concatenations of Multiple Attributes in Nifi

Re: Pricing for gateway nodes

Re: HandleHttpRequest and header case sensitivity

Re: nifi api run-status

Re: Nifi GetHDFS Warning - Could not remove from H...

Re: Nifi: Avoid merge 2 flowfiles from the same pr...

Re: MiNiFi to NiFi S2S load balancing does not wor...

Re: Dataflow question and special case with duplic...

Re: read CSV Record then update the values using n...

Re: NiFi Cluster NullPointerException issue when r...

Re: Visualizing the growth of the British Covid va...