Member since
01-27-2023
229
Posts
73
Kudos Received
45
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1334 | 02-23-2024 01:14 AM | |
| 1705 | 01-26-2024 01:31 AM | |
| 1124 | 11-22-2023 12:28 AM | |
| 2775 | 11-22-2023 12:10 AM | |
| 2859 | 11-06-2023 12:44 AM |
08-28-2023
12:46 AM
@Dim, I do not think that MergeRecord is the one doing this action, rather the schema you have defined in both your RecordReader and your RecordWriter. I am working for example with streaming data in Parquet and AVRO Format, using MergeRecord three times during the flow and each fractional remains fractional, because I set the RecordWriter to have a schema which accepts fractional data. I suggest you to take a look in the schema you have defined within your Controller Services and start debugging from there 🙂 Besides that, your problem might start from a different location. You need to check your flow from start to end and see if you are working with correct scale and precision and if the data types are correct or not.
... View more
08-22-2023
08:46 AM
@Anderosn, let me see if I understood correctly: - you have a processor which sends to a success queue a flowfile. - you would like to have this flowfile processed in parallel by two different processors, in the same time, right? If so, you just need to link the success queue from your first processor to both of the processors which are performing transformation on the JSON and you will achieve what you have described. Otherwise, please be a little bit more explicit in what you are trying to achieve 🙂
... View more
08-21-2023
07:53 AM
@sahil0915as @MattWho already pointed out, what you are trying to achieve is maybe not the best use case for NiFi. Nevertheless, if you still want to pursue this idea, prepare some resources because you might need them. What you can try will be divided between two separate Flows: (PS: I did not test if everything works as expected, but this is what I would do) - First one trying to identify if the number of rows from dc1 is equal to the number of rows in dc2 and dc3. You can do that by linking multiple processors: First things first, you will use an ExecuteSQLRecord to execute a "select count(*) from your_table". Next, linked to success you will have ExtractText Processor in which you will define a property named "dc_1_count" with the value "(.*)". This will save the value from the count into an attribute. Next, the success link will go into another ExecuteSQLRecord, which will execute an "select count(*) from your_table" but in dc2. From success, you will use another ExtractText and save the value into an attribute named dc_2_count. From ExtractText you go into another ExecuteSQLRecord where you will execute the same select but for dc3 and finally you will extract the value into dc_3_count with an extracttext. - Next, you will create an RouteOnAttribute Processor and make some advanced settings and check: if dc_1_count>dc_2_count go in a specific queue, if dc_1_count>dc_3>count go in a specific queue and if dc_1_count=dc_2_count or dc_3_count go into success and end the flow. Now, the second part is going to be the queues in case the count is not the same. If the count is not the same, you will have to execute san SQL Statement on dc2, which will extract all the rows from within your database. You are then going to split this file in several other files, each containing 1row per record. These new flow files will then go into an LookUpRecord processor, which is connected to the dc1 database and lookup the values present in the record. If the value is going to be there, you can discard that flowfile, otherwise, you can use an PutEmail or another processor to get alerted by this. ATTENTION: doing this will require lots of resources (RAM, CPU, HEAP Memory) and the OS configuration should allow you to work with millions of open files per second. Otherwise, I do not recommend you to try this. For testing purposes, you can use a table where you have like 10-20 rows and continue to bigger tables.
... View more
08-21-2023
02:24 AM
1 Kudo
@dulanga, please tell us your NiFi Version, your hardware configurations (OS, RAM and CPU), how many flows you already have on your canvas, your OS limits (open files, running processors, etc), your java version and how you configured bootstrap.conf (mostly the jvm settings and in case you are using an older java version, the args 7-8-9).
... View more
08-21-2023
02:12 AM
@galt, in this case, the only solution I could propose is to create a Script in which you develop this exact logic you are looking for. Afterwards, you can execute your script in an ExecuteStreamCommand Processor and wait to see the output. I suggested ExecuteStreamCommand because you can write the script in any language you desire and then execute it directly in NiFi. The only requirement is to have everything installed on those machines.
... View more
08-17-2023
06:31 AM
2 Kudos
@edim2525, you can create multiple users which will have admin rights. To do that, have a look here: https://community.cloudera.com/t5/Support-Questions/No-show-Users-and-Policies-in-Global-Menu/td-p/339127 https://community.cloudera.com/t5/Support-Questions/How-to-set-passwords-for-multiple-users-in-Apache-Nifi/td-p/367110 https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#user_authentication
... View more
08-17-2023
06:03 AM
@sinRudra, I do not think that your problem is related to the name itself and/or the characters. Your error messages states clearly that you do not have the required permissions to extract the file. If there would have been an issue with the name, you would have received an error message stating that the file cannot be found in that location. Make sure that you have all the configurations done properly to be able to access that ftp location.
... View more
08-17-2023
03:00 AM
1 Kudo
@galt, You could try to set your processors on debug with the hope to have something more written in the logs. Nevertheless, when it comes to network connectivity, I highly recommend you to involve your network team and ask them to identify what is causing the instability in the connection between the source and the target. They have the proper tools and logs to monitor the network flow between the systems. What they have to do is to monitor the traffic coming from your NiFi node and going to your remote server.
... View more
08-11-2023
05:47 AM
2 Kudos
I am not aware of any direct connectivity between Tika and NiFi. Straight from my mind, The only solution I would think is to create a brand new NiFi Processor and integrate the parsing logic from Tika directly within NiFi. The code can be written in Java and then integrate afterwards directly in NiFi.( have a look here maybe -- https://medium.com/hashmapinc/creating-custom-processors-and-controllers-in-apache-nifi-e14148740ea ) Another option, if not working on something to complex, might be to try to implement this logic in a script and execute it in NiFi with ExecuteScript (see some great tutorials here --> https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-3/ta-p/249148 )
... View more
08-09-2023
11:14 PM
@Anderosn, try reducing the Batch Size to 10 instead of 100. Even though the hint says it is the preferred number of Flow File to be put in the database, I do not know exactly if this is a hard limit or a soft limit.
... View more