About DennisJaheruddi

DennisJaheruddi · ‎01-31-2021

Recently, the weekly number of COVID-19 cases in The Netherlands has been steadily dropping week over week. However, underneath this lies a hidden positive trend of cases with the British COVID variant. In this article, I explain how I made my first visual in Cloudera Data Visualization with just a few clicks. Step 1: The Data The data is created by combining several sources that shed light on the percentage of new infections that are made up with this variant, as well as the total number of cases. As we are here for tech, more than science, I will only highlight the official report of cases per week and an official article that contains the latest percentage, as well as an earlier percentage. Additional explanation on the numbers is given in the post below this article. From here I could have chosen from many data sources, including a CSV file, but for reproducibility, I decided to upload it to Hive with a simple query: create table covid_cases as SELECT 431 as british_cases, DATE '2020-12-08' AS week_end_date UNION SELECT 1168 , DATE '2020-12-15' UNION SELECT 2470 , DATE '2020-12-22' UNION SELECT 3369 , DATE '2020-12-29' UNION SELECT 4854 , DATE '2021-01-05' UNION SELECT 5434 , DATE '2021-01-12' UNION SELECT 7755 , DATE '2021-01-19' UNION SELECT 11760 , DATE '2021-01-26' UNION SELECT 19085 , DATE '2021-02-02' People have different opinions on the best way to mock up data, but if I only need a few rows, I always like to build this kind of query with the Excel concat function. Step 2: The Connection As I used Cloudera Data Visualization within a Data Warehouse, the connection to that Data Warehouse is available out of the box. As such, I only needed to select the database and table. Step 3: The Visualization In order to minimize the effort, I decided to stick to the default settings where possible. This has the additional benefit that it is very easy to reproduce what I have done. Create a Dashboard Add a visualization for the table Select type: Bars Y-axis: british_cases (it automatically understands that we want the sum) X-axis: week_end_date (it already recognizes that it is a date) Change week_end_date in X-axis type to timestamp Labels: british_cases (it automatically understands that we want the sum) Give your X-axis and Y-axis a nice alias, and add a title and subtitle to the chart Now your chart should look just like the picture on top of this article. Hopefully, this enables everyone to gain more insight into how COVID develops in The Netherlands and of course in how to visualize data with just a few clicks. --- Edit: This additional data source indicates the cases for the week ending on 2nd Feb: Nieuwe varianten gooien roet in het eten

DennisJaheruddi · ‎11-01-2020

The executestreamcommand solution should work, as confirmed on this external website: To quote the most relevant part: ExecuteStreamCommand solved my problem. Create a small script and execute it with parameters from NiFi: #!/bin/bash HOST=$1 USER=$2 PASSWORD=$3 ftp -inv $HOST <<EOF user $USER $PASSWORD cd /sources delete $4 bye EOF

DennisJaheruddi · ‎08-13-2020

This message is labeled NiFi, so I assume you have NiFi available? In that case, look at finding the right processor for the job, something like ExecuteSQL may be a good starting point. ---- If your question is purely about how to make python and mariaDB interact, this may not be the best place to ask it.

DennisJaheruddi · ‎08-13-2020

Nifi is not really designed to work with 'context'. If you have a simple record there are many operations which you can do, but if you are working with potentially complex files and thus complex operations, you will likely rather process them with something like spark or python.

DennisJaheruddi · ‎08-13-2020

Assuming your flowfile contains multiple records, this should probably be achievable with the UpdateRecord processor. Note that the expression language has a UUID function which may be helpful to use inside this.

DennisJaheruddi · ‎08-13-2020

Only a partial answer but in general I do not think REGEX_REPLACE cuts large strings. It will be hard to figure this out in more detail unless you can share a reproducible example. Here is what i tested just now: 1. Create a table that contains a string of 60000+ characters (lorem ipsum) 2. Create a new table by selecting the regex replace of that string (i replaced every a with b) 3. Counting the length of the field in the new table --- As said, it may well be that you are using a very specific string or regex that together create this problem, it would be interesting to see if this could be reduced to a minimal example. -- Also keep in mind that though they are very similar, there are many ways a regex itself can be parsed, perhaps the test you did is simply slightly different than the implementation in Hive.

DennisJaheruddi · ‎08-13-2020

After checking the putHDFS processor I did not find the destination timestamp to be a configuration option. However, when checking the configuration options of hadoop fs -put it does show a timestamp option. This suggests there may be a way to achieve what you are looking for. My recommendation would be to log a jira on the Apache Nifi project, and reach out to your account team to see if they can push the priority.

DennisJaheruddi · ‎08-13-2020

Please clarify which Cloudera or Hortonworks platform you are using. It is a bit hard to think on a next step without this context. If none of these platforms are involved this may not be the best place to ask the question. ---- Sidenote: If you have an urgent functional question, in general the recommended approach is to contact your account team.

DennisJaheruddi · ‎08-12-2020

I have not tested it as such, but 10 minutes is a very long time for NiFi, I could imagine that the 'memory' of a load balancer is quite short to avoid overhead. I would just drop in many files in a short period of time and see what happens. If you set the load balancer to round robin I would expect the files to go to both nodes. Note that you can also use generateflowfile if you don't want to drop files all the time.

DennisJaheruddi · ‎08-12-2020

Though I don't know too much about sharepoint, it seems they have an API that allows for HTTP Get requests. Look carefully at the hidden section on this page: https://docs.microsoft.com/en-us/sharepoint/dev/sp-add-ins/get-to-know-the-sharepoint-rest-service?tabs=http Based on the above you could use both the GetHTTP processor or the InvokeHTTP processor with a get method. If this is the wrong way to access sharepoint, my general advice: 1. Figure out how sharepoint allows any program (whether it is nifi, python,...) to extract data. 2. From here it should be comparatively easy to figure out which processor you need to do this

Online	Offline
Last Visited	‎12-15-2021 03:18 AM

Member Since	‎01-07-2019 03:54 AM
Last Visited	‎12-15-2021 03:18 AM
Posts	220
Kudos received	31

Cloudera Community

Re: 在启用kerberos的集群flink程序如何连接集群外未启用认证的kafka

Re: Attribute validation against MSSQL database

Re: Put array with Dates on nifi flowfile

Re: NiFi templates don't include all controller se...

Re: Concatenations of Multiple Attributes in Nifi

Visualizing the growth of the British COVID varian...

Re: How can we delete files after sftp using Nifi ...

Re: Error Inserting records into mariadb table usi...

Re: Apache Nifi to do aggregation for the given tr...

Re: Is there any processor in NiFi which helps me ...

Re: hive REGEXP_REPLACE seems to cuts large string...

Re: Preserve timestamp in putHDFS

Re: Mismatch in the count between reading data in ...

Re: NiFi Load Balancing Demo

Re: Import files from Sharepoint