Member since
01-27-2023
229
Posts
74
Kudos Received
45
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1775 | 02-23-2024 01:14 AM | |
| 2312 | 01-26-2024 01:31 AM | |
| 1441 | 11-22-2023 12:28 AM | |
| 3598 | 11-22-2023 12:10 AM | |
| 3683 | 11-06-2023 12:44 AM |
09-25-2023
02:08 AM
1 Kudo
@BerniHacker, I have no experience with AWS, however I am using GCP .... so it should mostly be the same thing. In terms of GenerateTableFetch, you do not need to use Column for Value Partitioning in the same time as Maximum-value Columns ... I suggest you to only use Maximum-value Columns as you will get the same result but a little bit faster. Now, regarding your problem. I encountered something similar to what you are stating and if was related to connectivity and quotas set on the cloud environment. I set the GenerateTableFetch on Debug and used Execute Once to see what gets written in the Bulletin Board. In the same time I also opened the NiFi logs and used tail on nifi-app.log to see anything out of the ordinary. Once the quotas have been increased on the GCP side, I was able to extract around 1,000,000,000 rows without encountering any further issues. In the same time, I am also extracting data from local Oracle and MySQL Instances, totaling to more than 5B rows between 09:00 and 10:00AM, using a combination of GenerateTableFetch and ExecuteSQLRecord. Never have I encountered an problem with GenerateTableFetch due the size of the table. What you could also try is to execute the SELECT statement in your IDE and see how much it takes to get the results. Take note that NiFi might be a little slower, depending on where you have it configured as it could waste much time going through proxy and so on.
... View more
09-20-2023
07:24 AM
1 Kudo
@need_help, try replacing MergeContent with MergeRecords. I assume that each error log gets generated in a single flow file. Using MergeRecord you could achieve something similar but you will need to create two Controller Services: 1 for CSV Reading and 1 for CSV Writing, both of them using Inherit Schema. Next, you can group as many records as you would like and send them to your PutEmail processor. This is how I used it so far and it works pretty well for my use case.
... View more
09-11-2023
07:35 AM
@JohnSilver, first of all, I recommend you to set your processor on DEBUG, as it might provide you much more information that what you are seeing right now. In addition, have a look in the nifi-app.logs, as you might find something there as well. Next, I do not know how Kudu is configured on your side, but in any project I was involved, it required a authentication - the Kerberos Properties which seem to be blank on your side. Even though you might have a basic install of impala/kudu, as far as I know, it still requires some sort of authentication.
... View more
09-11-2023
07:30 AM
1 Kudo
@BillyG, set the property "Use Avro Logical Types" to true if you would like to use anything else besides STRINGS :).
... View more
08-29-2023
02:01 AM
@MukaAddA, I see that you are linking two SUCCESS queues to your PutFile. Your PutFile will take all the items from both queues in a random order I assume. Try removing one of the success queues and see what happens. Besides that, see how you configured PutFile to handle the files with the same name, especially if you are using the name further in your processing. In addition, set you processor on Debug and see what it displays and maybe you get a hint from there regarding you problem.
... View more
08-28-2023
08:05 AM
@JohnnyRocks, as @steven-matison said, you should avoid linking so many ReplaceText. I am not quite sure I understood your flow exactly, but something tells me that before reaching ReplaceText, something is not properly configured in your NiFi Flow. First of all, when using the classic Java Data Format, MM will always transpose in a two digit month, meaning that month from 1 to 9 will be automatically appended with a leading zero. "dd" will do the same trick but for days. As I see in your post, you said that your CSV reader is configured to read the data as MM/dd/yy, which should be fine, but somehow something is missing here ---> How do you reach the format of dd/MM/yyyy? What I would personally try to do is to convert all those date values in the same format. So instead of all those ReplaceText, I would try to insert an UpdateRecord Processor, where I would define my RecordReader and my RecordWritter with the desired schemas (make sure that your column is type int with logicaly type date). Next, in that processor, I would change the Replacement Value Strategy into "Record Path Value" and I would press on + and add a new property. I would call it "/Launch_Date" (pay attention to the leading slash) and I would assign it the value " format( /Launch_Date, "dd/MM/yyyy", "Europe/Bucharest") " (or any other timezone you require -- if you require your data in UTC, just remove the coma and the timezone).
... View more
08-28-2023
06:05 AM
Well I am not near a PC to test right now, but my initial thoughts are that the problem is related to how your raw data is coming in your flow. As I can see, you have both an INT value and a FLOAT Value .... and not a constant data type: "stake" : 0, "stake" : 0.5, Now, you set your Schema Access Strategy to Inherit Record Schema.This is correct in most cases, but in your case it is not, because your data is not stable. If two files are going into your MergeRecord and one has the value 0 and one has the value 0,5, you will have two different schemas, meaning that the files cannot be merged accordingly. If the first file comes as an INT, your second flowfile (or all the others coming right after) will automatically be converted to an INT value, no matter their value. To avoid this, you will have to generate the schema manually and change your RecordReader and your RecordWritter from Inherit Record Schema to "Use Schema Text Property" and define your schema manually in the new field (which will appear upon the switch). Make sure that in your schema that field is defined with a data type which accepts fractional data and not just an int value.
... View more
08-28-2023
01:57 AM
Well first of all, how does the data look like before entering MergeRecord? Secondly, how did you configure both the Reader and the Writer? You pasted the configuration for MergeRecord, where this has nothing to do with how the data gets transformed.
... View more
08-28-2023
12:51 AM
@dulanga, as far as I can tell from your previous post, you have around 3GB of RAM Memory available on your NiFi node, but you are assigning much more to your JVM. So, you have: total used free shared buff/cache available
Mem: 3.8Gi 1.5Gi 2.1Gi 145Mi 269Mi 2.1Gi
Swap: 511Mi 511Mi 0B But you are assigning much more to your JVM: # JVM memory settings
java.arg.2=-Xms4096m
java.arg.3=-Xmx8192m Try correcting your config files and assign the correct value for your JVM, in the bootstrap.conf file. Here are some best practices: https://community.cloudera.com/t5/Community-Articles/HDF-CFM-NIFI-Best-practices-for-setting-up-a-high/ta-p/244999
... View more
08-28-2023
12:46 AM
@Dim, I do not think that MergeRecord is the one doing this action, rather the schema you have defined in both your RecordReader and your RecordWriter. I am working for example with streaming data in Parquet and AVRO Format, using MergeRecord three times during the flow and each fractional remains fractional, because I set the RecordWriter to have a schema which accepts fractional data. I suggest you to take a look in the schema you have defined within your Controller Services and start debugging from there 🙂 Besides that, your problem might start from a different location. You need to check your flow from start to end and see if you are working with correct scale and precision and if the data types are correct or not.
... View more