About SAMSAL

SAMSAL · ‎11-30-2024

Hi @Vikas-Nifi , I think can avoid a lot of overhead such as writing the data to the DB for just doing the transformation and assigning the fixed width (unless you need to store the data in the DB). You can use processors like QueryRecord, UpdateRecord to do the needed transformation of data in bulk vs one record at a time and one field at a time. In QueryRecord you can use SQL like function based on apache calcite sql syntax to make transformation or derive new columns just as if you are doing mysql query. UpadateRecord also you can use Nifi Record Path to traverse fields and apply functions in bulk vs one record at a time. There is also a FreeFormTextRecordSetWriter service that you can use to create custom format as an output. For example in the following dataflow, Im using ConvertRecord process with CSVReader and FreeFormTextRecordSetWriter to produce desired out: The GenerateFlowFile processor is used to create the input CSV in flowfile: The ConvertRecord is configured as follows: The CSVReader you can use default configuration. The FreeFormTextRecordSetWriter is configured as follows: In the Text Property you can use the columns\fields names as listed in the input and provided to the reader . You can also use Nifi Expression Language to do proper formatting and transformation to the written data as follows: ${DATE:replace('-',''):append(${CARD_TYPE}):padRight(28,' ')}${CUST_NAME:padRight(20,' ')}${PAYMENT_AMOUNT:padRight(10,' ')}${PAYMENT_TYPE:padRight(10,' ')} This will produce the following output: 20241129Visa Test1 0.01 Credit Card 20241129Master Test2 10.0 Credit Card 20241129American Express Test3 500.0 Credit Card I know this not 100% what you need but it should give you an idea what you need to do to get the desired output. Hope that helps and if it does, please accept the solution. Let me know if you have any other questions. Thanks

SAMSAL · ‎11-28-2024

Hi, Its seems like you are running out of heap memory when adding new attributes through the evaluateJsonPath processor. Attributes are stored in the heap and you should be aware not to store large data in flowfile attributes if you are going to have so many flowfiles in order to avoid running into such issue. Can you please elaborate on what are you trying to accomplish after converting Avro To Json? to me it doesnt make sense what you are doing because you are merging towards the end which means you might not even get the attribute you are extracting depending on how you set the Attribute Strategy in the MergeRecord processor.

SAMSAL · ‎11-28-2024

Hi, First, if the data you have posted contain real personal info I would recommend to remove and use some dummy data instead. Its violation of community guidelines to post personal information (see point 7 of community guidelines). In regards to the error: you are getting it because of the property setting Quote Character = " in the CSVReader service. What this setting means is that when you have sentence that has once of the reserved CSV characters like comma (,) as column separator and new line (\n) to separate records where you dont\cant use the escape character (\), then you can surround the whole column value with double quotes at both ends. This means you should not have any following character for the same column. For more info please refer to : https://csv-loader.com/csv-guide/why-quotation-marks-are-used-in-csv Since the line you have listed has following characters after the closing " , you are getting the illegal character error. To Resolve: You have two options: 1- Use Replace Text to replace any double quote " character with \" to escape the double quote. However this might not be so efficient if you have large CSV file. 2- More efficient option, is to replace the Quote Character in the CSVReader with something other than " , however you have to make sure that your data is not going to contain the new character in any of the CSV values. Possible options: $,%,^ If this helps please accept the solution. Thanks

SAMSAL · ‎11-27-2024

Sure, If you come up with a solution different than what I suggested please do post about it so it can help others who might run into similar situation. good luck

SAMSAL · ‎11-27-2024

Hi, It still not clear to me what is exactly happening and where. The error message states a field called ecoTaxValues which doesnt seem to exist in the provided input. You also mentioned that you are using ConsumeKafka and getting an error there through reader\write while the consumeKafka processor doesnt take any reader\writer service. The consumeKafkaRecord does....is that what you are using? Please be specific when describing the problem as much as you can. If you cant share the information for security reason then I would recommend you try to reproduce using sample data and dataflow to make it easier to isolate the error. Also please share screenshot\accurate description of the dataflow since the inception of the input and share the processor configurations as well as any services that are being used.

SAMSAL · ‎11-26-2024

It seems like whenever dealing with parquet reader\writer services , those services are trying to use Avro schema, possibly to make sense of the data when passing it along to the target processors ( like PutDatabaseRecord ) since parquet is in binary format. The problem with this is that Avro has limitation on how fields should be called. Actually this is reported as a bug in Jira but it doesnt seem to have been resolved. According to the ticket Avro fields should only start with the following characters [A-Za-z_] . Given this , it seems you have to think of some workaround to address this issue since Nifi doesnt provide a solution out of the box. you can check my answer to this post as an option. Basically, you can use python to read the parquet content and transfer to another format (such as CSV as an example) then pass the CSV to the PutDatabaseRecord. This should work as I have tested it. Since you seem to be using Nifi 2.0 , you can develop python extension processor for this instead of ExecuteStreamCommand mentioned in the post. Hope that helps. If it does, please accept the solution. Thanks

SAMSAL · ‎11-26-2024

Can you provide more information on your dataflow ? let's say you are using GenerateFlowFile to create the json Kafka output, what happens next? How are you enriching the data and what kind of processor where you are using the json reader\writer service that is causing the error? I need to see the full picture here because When I use same json you provided in GenerateFlowFile processor and then passed it to QueryRecord with the same Json reader\writer service configuration, it seems to be working!

SAMSAL · ‎11-26-2024

Hi @PradNiFi1236 , How are you adding the new fields? You Json appears to be invalid as provided.

SAMSAL · ‎11-26-2024

Hi , Can you provide more explanation\screenshot of your dataflow and the configuration set on each processor\controller service? Also if you can provide sample data that can be converted to parquet which can then reproduce the error that would be helpful as well. Thanks

SAMSAL · ‎11-25-2024

Hi , I dont see a function toNumber in the record path syntax , so Im not sure how did you come up with this. It would be helpful next time if you provide the following information: 1- input format. 2- screenshot of the processor configuration causing the error. As for your problem , the easiest and more efficient way - than splitting records- I can think of is using the QueryRecrod processor. lets assume you have the following csv input: id,date_time 1234,2024-11-24 19:43:17 5678,2024-11-24 01:10:10 You can pass the input to the QueryRecord Processor with the following config: The query above is added as a dynamic property which will expose new relationship with the property name that you can use to get the desired output. The query syntax is the following: select id,TIMESTAMPADD(HOUR, -3,date_time) as date_time from flowfile The trick for this to work is how you configure the CSV Reader and Writer to set the expectation on how to parse datetime fields in the reader\writer services: For the CSVReader, Make sure to set the following: CSVRecordSetWriter: Output through Result relationship: id,date_time 1234,2024-11-24 16:43:17 5678,2024-11-23 22:10:10 Hope that helps. If it does, please accept solution. Thanks

Online	Offline
Last Visited	‎12-31-2024 03:55 PM

Member Since	‎07-29-2020 02:31 PM
Last Visited	‎12-31-2024 03:55 PM
Posts	574
Kudos received	320

Cloudera Community

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Jolt spec to flatten the nested JSON

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Converting Nested JSON to Flat JSON using JOLT

Re: NIfi: javax.security.auth.login.LoginExceptio...

Re: Out of memory heap error while processing bulk...

Re: Out of memory heap error while processing bulk...

Re: How to process corrupted CSV data with NiFi

Re: Problem when trying to convert parquet file

Re: Failed to parse message from Kafka nifi.serial...

Re: Problem when trying to convert parquet file

Re: Failed to parse message from Kafka nifi.serial...

Re: Failed to parse message from Kafka nifi.serial...

Re: Problem when trying to convert parquet file

Re: Apache Nifi: substract hours from column valu...