About SAMSAL

SAMSAL · ‎12-01-2023

Hi @Fayza , I assume you set the credentials in the Request Username & Request Password. Can you set the "Response Generation Required" to true? this will make sure the processor is capturing all kind of responses regardless of the Http status. Another thing do you have to have any special headers attribute that needs to be part of the request ? for example Content-Type which corresponds to Request Content-Type property , or any custom header?

SAMSAL · ‎12-01-2023

@ChuckE, This worked for me : UpdateAttribute Configuration: Notice the "Stateful Variables Initial Value" is set to 1 The UpdateAttribute success relationship has the result flowfile with the following attribute: If I run the flow again , the new flowfile will have the following attribute: Its strange that It did not work for you when set the initial value to 1 ? I dont see anything else wrong. If that still did not work for you, can you try to upgrade Nifi just to see if its not a bug with 1.19.1. I use 1.20 for testing but you can upgrade to the latest. If you find this helpful please accept solution. Thanks

SAMSAL · ‎11-30-2023

@yan439, Im not sure I understand. I thought you have the schema already defined in the registry with the correct column names and data types. Can you elaborate more on how the avro schema came about and if its the same thing you are using the in the registry?

SAMSAL · ‎11-30-2023

Thanks @MattWho , As far as the managed Managed-Authorizer, I usually configure my access using LDAP provider but without providing my AD account any access I wont be able to log in to Nifi. I use the Single-User-Provider with the auto generated username and password to grant myself access in Nifi before I change to ldap-provider and be able to log in. Not sure if this is the right way to do it. let me know what you think. Thanks

SAMSAL · ‎11-30-2023

Hi @scoutjohn , Your spec can be written as follows: [ { "operation": "shift", "spec": { "*": "&", "serviceOrderItem": { "*": { "*": "serviceOrderItem.[&1].&", "service": { "*": "serviceOrderItem.[&2].service.&", "supportingService": { "$": "serviceOrderItem.[&3].service.serviceCharacteristic[#].name", "@": "serviceOrderItem.[&3].service.serviceCharacteristic[#].value" } } } } } }] I did not not add the "modifyPath" because I did not see anything related to this object in the provided json input. Notice how I used # for serviceCharacteristic[#].name & serviceCharacteristic[#].value which tells the spec to group everything under one object under the serviceCharacteristic array. If you find this helpful please accept solution. Thanks

SAMSAL · ‎11-29-2023

Since you said that my proposed solution work for the first part can you accept the solution and then open new ticket for the latest question because its a little different from your first question. Also regarding your requirement in the latest post, I'm having a hard time understanding what you are trying to do and I have the following questions that I hope you can answer or clarify better if you decide to open new ticket: 1- What do you mean by max queue size is 250? how do you set that ? Is it a batch process where each batch you process total of 250 or at a given time you should not have more than 250 flowfiles 2- You say that you want them to Retire in 24 hours, is this for the 250 flowfile ? if that is true then how is this going to work when you say that you want to retire a flowfile every hour for 24?! This is so confusing to me 3- Are you saying that after you retire (completed 24 )all the files in the queue then you want to log a message? Do you mean if all the 250 flowfiles fail and retire? then how this is going to work when there some files succeed and other failed ? It would help also if you post screenshot of the complete flow and highlight what you want to do in each process and which part of the flow you are having problem with detailing clearly what the problem is related to the target processor (publishkafka in your case ) and what is the expectation Thanks

SAMSAL · ‎11-28-2023

Hi, I have managed to download the latest Nifi 2.0.0 M1 and I'm trying to run it on my windows 10 machine. Doing some preliminary testing I ran into the following issues: 1- The system requirement indicates that (https://nifi.apache.org/project-documentation.html ) indicates that at minimum I need Java 17, but when I try to start nifi using run.bat I get the following error: Error: LinkageError occurred while loading main class org.apache.nifi.bootstrap.RunNiFi java.lang.UnsupportedClassVersionError: org/apache/nifi/bootstrap/RunNiFi has been compiled by a more recent version of the Java Runtime (class file version 65.0), this version of the Java Runtime only recognizes class file versions up to 61.0 It turns out it needs Java 21. Not sure if the documentation has not been updated or if Im missing something. 2- After upgrading to Java 21, Im able to start nifi using default configuration, the log file doesn't show any error and default username and password are generated, however when I try to browse for https://127.0.0.1:8443/nifi I get the following error: Not sure if this is something local to my machine but upon some internet search, I replaced url from 127.0.0.1 to localhost and it worked as I get to the log in screen. 3- This is not related to to 2.0 but I Want to mention in case someone else runs into the same issue. Basically by default, the generated user doesnt have access to security settings regarding Users & Policies. To enable this you need to set the : nifi.security.user.authorizer=managed-authorizer And add the generated username to the authorizers.xml as mentioned here : https://community.cloudera.com/t5/Support-Questions/No-show-Users-and-Policies-in-Global-Menu/td-p/339127 4- The ExecuteScript processor doesnt have Python(Jython) script engine. It could be its deprecated , but that is not mentioned in the depricated components site (https://cwiki.apache.org/confluence/display/NIFI/Deprecated+Components+and+Features ) . It only talks about removing support for Ruby , ECMAScript but not python . If its deprecated , what is the alternative ? Is it using Python API ? 4- Minor glitch I noticed when browsing nifi using chrome , for some reason the "Import from Registry" Icon is not showing! It shows up in Edge and it shows up if I open chrome in private mode. Not sure if its caching issue or what. Please advise. Thanks

SAMSAL · ‎11-28-2023

Hi @Rohit1997jio , Not sure if this can be done using the FlowFile Expiration. However, if you are using Nifi 1.16 or higher you can take advantage of another methodology using the "retry" option on the target processor failure relationship as follows: The concept here is to use the settings for "Number of Retry Attempts" , " Retry Back Off Policy" & "Retry Max Back off Period" to configure how often and for how long the file is retried before it gets pushed to the failure relationship queue where you can then log the needed message. Every failed retry, the flowfile will be pushed back to the upstream queue and wait the designated time before its tried again. The challenge here is how to set those values so that the flowfile is only kept for certain period of time ( 1 hour in your case) specially the file will wait in the queue before its tried again depending wither you set the policy to Penalize or Yield, which is a good thing because you want to have some delay before the flowfile is tried again to avoid a lot of overhead. For example if you want the file to expire in an hour , and you want to try it 60 times where each time you wait 1 min before the next retry then you can set the values as follows: Number Of Retry Attempts: 60 Retry Back Off Policy : Penalize ( Set the Penalty Duration under Settings Tab to 1 min) Retry Maximum Back Off Period: 1 min ( this to ensure that the wait time in the queue doesnt exceed the initial penalty duration because every subsequent retry the duration penalty time is doubled - not sure why- ) In this case the flowfile will be retried 60 times upon failure , where each time the flowfile is pushed back to upstream queue an wait only max 1 min before the next retry , which makes the total time flowfile is retried = 60 * 1 min = 60 mins = 1 hour Depending how often you want to retry and how long you want to wait before the next retry, you can adjust those numbers accordingly. Once done with all the retries the flowfile will be moved to failure relationship where you can log the final message. If that helps please accept solution. Thanks

SAMSAL · ‎11-27-2023

The issue you are having is when you try to read the parquet file using the ParquetReader where its failing on the invalid column names containing the illegal character "-" . I dont know of a way you can address this in Nifi. You probably have to fix this before you consume through Nifi. You can use pandas dataframe in python to help you remove any illegal characters from column name as an example : import pandas as pd df = pd.read_parquet('source.parquet', engine='fastparquet') # replace hyphen with underscore in column names df.columns = df.columns.str.replace("-","_") df.to_parquet("target.parquet",engine='fastparquet') Its possible to do this through Nifi as well using ExecuteStreamCommand : https://community.cloudera.com/t5/Support-Questions/Can-anyone-provide-an-example-of-a-python-script-executed/td-p/192487 The steps will be like this: 1- Fetch Parquet from S3 2- Save to Staging area with certain filename using PutFile 3- Run ExecuteStreamCommand and pass filename and path to the py . The py script will rename columns as shown above and save final copy to target folder 4- Use FetchFile to get the final parquet file from target folder using the same filename 5- Convert Record .... If that helps please accept solution. Thanks

SAMSAL · ‎11-22-2023

Hi @glad1 , Can you elaborate more on the data that you want to remove? For example if the data is part of the CSV and it has unique value in one or more columns, then you can use QueryRecord processor where the query exclude records with this unique value. If the data is out of the CSV - like a header information - then depending how this data look like and if its surrounded with some special characters then you can use ReplaceText Processor with regex that would isolate those lines and then replace them with empty space and so on. If you can provide some sample data it would help in figuring out the best solution for this scenario. Thanks

Online	Offline
Last Visited	‎05-08-2025 03:43 AM

Member Since	‎07-29-2020 02:31 PM
Last Visited	‎05-08-2025 03:43 AM
Posts	574
Kudos received	323

Cloudera Community

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Jolt spec to flatten the nested JSON

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Converting Nested JSON to Flat JSON using JOLT

Re: NIfi: javax.security.auth.login.LoginExceptio...

Re: How to read all databases and its records from...

Re: Setting Initial Value of Stateful Variables in...

Re: Put data from Parquet files into DynamoDB with...

Re: Nifi 2.0.0 M1 Installation & Running Issues

Re: Creating a value in json only if the key exist...

Re: how to print log message after a flow file is ...

Nifi 2.0.0 M1 Installation & Running Issues

Re: how to print log message after a flow file is ...

Re: Put data from Parquet files into DynamoDB with...

Re: Remove first few lines in a text/csv flowfile ...