About SAMSAL

SAMSAL · ‎06-21-2024

Hi, I know this a long shot but I'm going to ask anyway and I'm hoping someone can help because I have been struggling with this for days. I'm trying to create custom reporting task which I managed to do after a spending lots time trying to figure out the correct template, dependencies and conversation since not much available out there for such customization. I managed to deploy it and use and it's working as expected except I would like to run it on primary node . I know by convention the reporting task should not be dependent on a node but im just carious if there is a way in the code to make it work as such. @bbende , @MattWho , @stevenmatison

MikeH · ‎06-18-2024

That's a good idea, however low latency is a user requirement. Currently, processing each file from source to destination takes around one minute. If I add a two minute delay, the users would not be happy.

MattWho · ‎06-14-2024

@SAMSAL Looking at output provided, you appear to running your Apache NiFi on Windows. It appears this issue was raised 2 days ago against M3 in Apache Jira here: https://issues.apache.org/jira/browse/NIFI-13394 It is currently unresolved. You can certainly create an Apache jira account and add additional comments to this jira with your detailed findings. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎06-14-2024

@SAMSAL This is some great detail. I believe you are hitting this bug that has been fixed for the next 2.0.0 milestone release (M4): https://issues.apache.org/jira/browse/NIFI-13329 There will eventually be a 2.0.0 RC release. That will be the first official Release Candidate for new 2.x versions that will follow all these development milestone releases. You can create an Apache Jira account that would give you the ability to raise new issues you find directly in the Apache NIFi project. This is best way to bring your finding to the developer community. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎06-14-2024

@helk You can use a single certificate to secure all your nodes, but i would not recommend doing so for security reasons. You risk compromising all your host if any one of them is compromised. Additionally NiFi nodes act as clients and not just servers. This means that all your hosts will identify themselves as the same client (based off DN). So tracking client initiated actions back to a specific node would be more challenging. And if auditing is needed, made very difficult. The SAN is meant to be used to differently. Let's assume you host an endpoint searchengine.com which is back by 100 servers to handle client requests. When a client tries to access searchengine.com that request may get routed to anyone of those 100 servers. The certificate issues to each of those 100 servers is unique to each server; however, every single one of them will have the searchengine.com as an additional SAN entry in addition to their unique hostname. This allows the host verification to still be successful since all 100 are also known as searchengine.com. Your specific issue based on shared output above is caused by the fact that your single certificate does not have "nifi01" in the list of Subject Alternative Names (SAN). It appears you only added nifi02 and nifi03 as SAN entries. The current hostname verification specs no longer use DN for hostname verification. Only the SAN entries are used for that. So all names(hostnames, common names, IPs) that may be used when connecting to a host must be included in the SAN list. NiFi cluster keystore requirements: 1. keystore can contain only ONE privateKeyEntry. 2. PrivateKey can not use wildcards in the DN. 3. PrivateKey must contain both clientAuth and serverAuth Extended Key Usage (EKU). 4. Privatekey must contain at least one SAN entry matching the hostname of server on which keystore will be used. The NiFi truststore must contain the complete trust chain for your cluster node's PrivateKeys. On truststore is typically copied to and used on all nodes. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

SAMSAL · ‎06-14-2024

Hi @Thar11027 I stand corrected. Well, lets be more specific and you can't get more specific than looking the code itself in github :). It turns out the PutDatabaseRecord uses a DatabaseAdapter which is an interface type that gets implement by each Database Engine Type and passed through the DB service associated with this processor (DBCPConnectionPool). Those adapters are responsible for generating the SQL for each statement type (insert, update, delete....). For MySql there is an adapter called MySQLDatabaseAdapter and if you look at the genereateUpsertStatement method you will find that it uses the following syntax: StringBuilder statementStringBuilder = new StringBuilder("INSERT INTO ") .append(table) .append("(").append(columns).append(")") .append(" VALUES ") .append("(").append(parameterizedInsertValues).append(")") .append(" ON DUPLICATE KEY UPDATE ") .append(parameterizedUpdateValues); return statementStringBuilder.toString(); Notice the use of "ON DUPLICATE KEY UPDATE" syntax. If you look for what that means in MySQL (https://blog.devart.com/mysql-upsert.html ) you will find that yes it will check if the record key exists or not , and if it does then it will do an update state however that only works on table Primary Key. In your case for the Transaction tale it works because as you mentioned the transaction_id is the primary key and you probably passing this column as part of the record data, however for the other table the id set to auto increment and probably you are not passing it as part of the record and instead relying on none primary key id_from_core. Not sure if its possible to change your table where this column is your primary key, otherwise you will find yourself having to do lookup to find if it exists or not and may be get the id then do your upsert with the id but Im not sure how this will work with Auto Increment being set. Another option which I tend to do in my case to avoid adding more processors\control services is to create stored proc that will defer all that checking for update or insert to sql then use PutSQL processor to execute the stored proc passing all columns to it but this can be cumbersome if you have so many columns which seem to be your case. What you can do to avoid passing each column is pass record as json string and do json parsing to find the column values in mySQL. Hope that helps.

MattWho · ‎06-13-2024

@SAMSAL Thank you for the kind words. Likewise, the community thrives through members like yourself. Thank you for all your amazing contributions.

SAMSAL · ‎06-12-2024

Hi , Just a word of advice so you get better luck with your posts getting noticed and possibly someone to provide you with possible resolution: If you can shorten your json input next time to be isolated only to the problem that would be more helpful. You dont have to post the 99% that works with the 1% that doesnt as long as it doesnt affect the overall structure. For example if you have 50 fields you can just post 1-2 fields and if you have an array of 50 elements , 1-2 elements should be enough. Going to your problem , if you know that you will only have two amounts all the times as you specified, then you can intercept them in the first shift spec and assign the proper field names (Amount1, Amount2 ) as follows: [ { "operation": "shift", "spec": { "*": { "CustomFields": { "*": { "ViewName": { "Amount": { "@(2,Value)": "[&5].Amount1" }, "*": { "@(2,Value)": "[&5].&1" } } } }, "PortfolioSharing": { "*": { "@(0,PortfolioId)": "[&3].sharing_PortfolioId", "@(0,CustomerId)": "[&3].sharing_CustomerId", "@(0,PersonalId)": "[&3].sharing_PersonalId", "@(0,ReasonId)": "[&3].sharing_ReasonId", "@(0,ReasonProgId)": "[&3].sharing_ReasonProgId", "@(0,ReasonName)": "[&3].sharing_ReasonName", "@(0,Comment)": "[&3].sharing_Comment", "@(0,TypeId)": "[&3].sharing_TypeId", "@(0,TypeProgId)": "[&3].sharing_TypeProgId", "@(0,TypeName)": "[&3].sharing_TypeName" } }, "Amount": "[&1].Amount2", "*": "[&1].&" } } } , { "operation": "modify-default-beta", "spec": { "*": { // trx_customer "ResidentNonresident": "@(1,Resident/Non-resident)", "NationalityCountryofIncorporation": "@(1,Nationality/CountryofIncorporation)", "PermanantTownCity": "@(1,PermanantTown/City)", "SubsidiaryAssociateofanotherorganization": "@(1,Subsidiary/Associateofanotherorganization)", "Howdidyougettoknowaboutus": "@(1,Howdidyougettoknowaboutus?)", // trx_portfolio "UseBankAccountFromCustomer": "@(1,UseBankAccountFromCustomer?)" } } }, { "operation": "remove", "spec": { "*": { // trx_customer "Resident/Non-resident": "", "Nationality/CountryofIncorporation": "", "PermanantTown/City": "", "Subsidiary/Associateofanotherorganization": "", "NatureOf_Business": "", "Howdidyougettoknowaboutus?": "", //trx_portfolio "UseBankAccountFromCustomer?": "" } } } /**/ ] Hope that solve your problem. If you found this is helpful, please accept the solution. Thanks

omeraran · ‎06-05-2024

hi SAMSAL, i send you message. Can you check it please ?

SAMSAL · ‎06-02-2024

Hi @mohdriyaz , It seems the generatetablefetch by design pushes and penalize the flow to the upstream queue when there is an error that is considered none "sql query" error which per documentation only query execution error will go to the failure rel. It doesnt seem like there is anything you can do to change this behavior. The only thing I can think of to mitigate this issue is by adding another processor as a guard to help checking the connection status before calling the generateTableFetch processor. For example you can use PutSQL processors which seems from the description able to capture connectivity error through retry & maybe the failure rel per description: This is not going to solve the problem 100% but it will minimize it and it will help you capture the error. You can configure how many to retry and for how long in the relationship tab once you select to retry. Hope that helps. If it does help please accept the solution. S

Online	Offline
Last Visited	‎12-31-2024 03:55 PM

Member Since	‎07-29-2020 02:31 PM
Last Visited	‎12-31-2024 03:55 PM
Posts	574
Kudos received	320

Cloudera Community

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Jolt spec to flatten the nested JSON

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Converting Nested JSON to Flat JSON using JOLT

Re: NIfi: javax.security.auth.login.LoginExceptio...

Custom Reporting Task on Primary Node

Re: Limit number of files fetched by directory

Re: Python Extension Processors In M3 release sti...

Re: M3 Release Bug : HTTP ERROR 500 Content prepa...

Re: Error Securing NiFi Cluster with a Single Cert...

Re: Get duplicate records in MySql

Re: Apache Nifi Release 2.0 M1 & M2 High CPU Utili...

Re: Jolt Returns unnecessary output

Re: Extracting List(Array) element using JOLT

Re: GenerateTableFetch - On Error