About cotopaul

sinRudra · ‎08-18-2023

For information, jira ticket created. https://issues.apache.org/jira/browse/NIFI-11967

MattWho · ‎08-11-2023

@Madhav_VD Apache NiFi contains no native processors that utilize Apache Tika other than IdentifyMimeType (this processor does not do any extraction), but you can find others in the Apache that have created custom processors that utilize Apache Tika. Adding custom nars to Apache NiFi is as easy as adding the custom nar to the auto-load directory: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#autoloading-processors While I have no experience with any of these custom nars, you can give them a try to see if they meet your needs. If not they may provide you with a stepping stone for creating your own custom variant. https://github.com/tspannhw/nifi-extracttext-processor/releases/tag/html https://community.cloudera.com/t5/Community-Articles/ExtractText-NiFi-Custom-Processor-Powered-by-Apache-Tika/ta-p/249392 https://community.cloudera.com/t5/Community-Articles/Creating-HTML-from-PDF-Excel-and-Word-Documents-using-Apache/ta-p/247968 https://github.com/tspannhw/nifi-extracttext-processor If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

MattWho · ‎08-10-2023

@Anderosn In-between your SplitJson and PuSQL processors are you rebalancing the FlowFile across multiple nodes in a NiFi cluster? Are you routing any of the split Json messges down a different dataflow path that does not lead to this pusSQL processor? The reason I ask is because the splitJson processor will write the following FlowFile attributes to each new FlowFile created (each split): The fragment.identifier value and fragment.count are used by the putSQL processor when "Support FragmentTransactions" is set to "true" (default). This means that, if not all split jsons are present at this putSQL and located on the same node of the NiFi cluster, the FlowFiles part of the same fragment.identifier will not be processed and remain on the inbound connection to the PutSQL. I'd start my listing the connection and checking these attributes to verify the fragment.count is "10", the fragment.identifier has same value on all 10, and fragment.index value shows numbers 1 to 10 across those 10 FlowFiles. If making sure all fragments are processed in same transaction is not a requirement for your dataflow, try changing "Support Fragmented Transactions" to false and see if these 10 FlowFiles get successfully executed by your putSQL processor. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

cotopaul · ‎08-07-2023

yes, you can try like that or like : format( /datetime, "yyyy/MM/dd/HH", "Asia/Jakarta") - but you will need to switch Replacement Value strategy, from Literal Value to Record Path Value.

TimothySpann · ‎08-06-2023

You need to look at CDC tools like Cloudera Kafka Connect with Debezium or Cloudera SQL Streaming Builder with Debezium. Kafka Connect https://docs.cloudera.com/runtime/7.2.17/kafka-connect/topics/kafka-connect-connector-debezium-oracle.html SSB https://docs.cloudera.com/csa/1.10.0/how-to-ssb/topics/csa-ssb-cdc-connectors.html Or you can do Oracle GoldenGate with Kafka. I have some examples in this article https://medium.com/cloudera-inc/cdc-not-cat-data-capture-e43713879c03

MihaiMaranduca · ‎08-01-2023

Good morning @cotopaul @SAMSAL First of all thank you very much for your time Ok Ok so I see that I missunderstood how QueryDatabaseTable is "triggered". @SAMSAL I knew about those configuration but as I said I thought this processor only do the query when a new record is inserted in the database taking reference the maximum column value. Makes sense now the postgresql log...the querys executed every second since my RunSchedule is on default (0). What I use minifi for is to load sales from a supermarket then send it via API (and others transformations), so I need this to be in real time as soon as the client paid. I guess I have to define Run Schedule every few seconds. @cotopaul I will take a look at my DBCPConnectionPool, this info it´s helpfull Thank you all.

SAMSAL · ‎08-01-2023

Hi @MWM , What you are describing is a classical data enrichment pattern that can be achieved using ForkEnrichment & JoinEnrichment processors. For more information on this please refer to : https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.20.0/org.apache.nifi.processors.standard.JoinEnrichment/additionalDetails.html Based on your scenario the "SQL" strategy of the JoinEnrichment will work the best for you since you can select the main fields (system, name, surname, phone, mail) from the original flowfile data and select proffesion and department from enrichment result: SELECT o.system, o.name, e.surname,o.phone, o.mail, e.profession, e.department FROM original o LEFT OUTER JOIN enrichment e ON o.name= e.name Since you are splitting the CSV and enrich per record then you can just join by name. If you have an API where you can get a collection of user information then you dont have to split and you can do the enrichment on multiple records from the CSV vs. returned records from the API json output , however be aware that if you have large data set this strategy "... can lead to heap exhaustion and cause stability problems or OutOfMemoryErrors to occur". Please review the link above to see how this can be mitigated. If you find this is helpful please accept solution. Thanks

MattWho · ‎07-31-2023

@Kiranq What version of Java is your NiFi using? Sharing the output for your NiFi-Registry configured keystore and truststore in the nifi-registry.properties file would help. Sharing the keystore and truststore configured in your NiFi registry client would help as well. ./keytool -v -list -keystore <keystore or trustsore> You can also use openssl to see what is sent from the server (NiFi-Registry) to client (NiFi) in the initial TLS exchange. ./openssl s_client -connect <nifi-regisry hostname>:<NiFi-Registry port> -showcerts Matt

MattWho · ‎07-27-2023

@Luwi I agree that it is likely not some kinda of quota and rather a connection interruption and since the default timeout in the smtp client is indefinite, the PutEmail processor will never timeout the bad connection and then execute a new thread to establish a new connection. The Jira (https://issues.apache.org/jira/browse/NIFI-9758) I mentioned in previous response provides the new capability to set smtp timeouts to resolve this issue. You'll need to upgrade to Apache NiFi 1.17 or newer to get this processor improvement. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

cotopaul · ‎07-27-2023

@Sivaluxanif I am not mistaken, you need to download the JDBC of the database you are using in azure cosmos db. I am no expert in Azure, but as far as I know you have NoSQL, MongoDb, Cassandra, Gremlin and PostgreSQL. You first need to identify the DB you are connecting to, download the JDBC to that specific database and proceed next from there.

Online	Offline
Last Visited	‎03-14-2024 06:37 AM

Member Since	‎01-27-2023 08:25 AM
Last Visited	‎03-14-2024 06:37 AM
Posts	229
Kudos received	73

Cloudera Community

Re: About mergecontent question

Re: how can get the content of Json record and val...

Re: DBCP Connection Pool can't connect to "Progres...

Re: terminate kafka connection if publish kafka pr...

Re: Not able to delete an inifinite loop built wit...

Re: ListFTP not taking into account french special...

Re: How to use Apache Tika in NIFI to extract met...

Re: PutSQL - Not enough FlowFiles for transaction...

Re: JsonTreeReader changing time zones in NIFI

Re: NIFI ORACLE and SQL CDC

Re: Nifi QueryDatabaseTable

Re: How can I work with response in NIFI

Re: Access Nifi Registry bucket hosted on a server...

Re: PutEmail frozen

Re: How to load data from Azure CosmosDB using Apa...