Member since
07-29-2020
574
Posts
320
Kudos Received
175
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
239 | 12-20-2024 05:49 AM | |
270 | 12-19-2024 08:33 PM | |
285 | 12-19-2024 06:48 AM | |
236 | 12-17-2024 12:56 PM | |
222 | 12-16-2024 04:38 AM |
11-30-2024
11:42 AM
1 Kudo
Hi @Vikas-Nifi , I think can avoid a lot of overhead such as writing the data to the DB for just doing the transformation and assigning the fixed width (unless you need to store the data in the DB). You can use processors like QueryRecord, UpdateRecord to do the needed transformation of data in bulk vs one record at a time and one field at a time. In QueryRecord you can use SQL like function based on apache calcite sql syntax to make transformation or derive new columns just as if you are doing mysql query. UpadateRecord also you can use Nifi Record Path to traverse fields and apply functions in bulk vs one record at a time. There is also a FreeFormTextRecordSetWriter service that you can use to create custom format as an output. For example in the following dataflow, Im using ConvertRecord process with CSVReader and FreeFormTextRecordSetWriter to produce desired out: The GenerateFlowFile processor is used to create the input CSV in flowfile: The ConvertRecord is configured as follows: The CSVReader you can use default configuration. The FreeFormTextRecordSetWriter is configured as follows: In the Text Property you can use the columns\fields names as listed in the input and provided to the reader . You can also use Nifi Expression Language to do proper formatting and transformation to the written data as follows: ${DATE:replace('-',''):append(${CARD_TYPE}):padRight(28,' ')}${CUST_NAME:padRight(20,' ')}${PAYMENT_AMOUNT:padRight(10,' ')}${PAYMENT_TYPE:padRight(10,' ')} This will produce the following output: 20241129Visa Test1 0.01 Credit Card
20241129Master Test2 10.0 Credit Card
20241129American Express Test3 500.0 Credit Card I know this not 100% what you need but it should give you an idea what you need to do to get the desired output. Hope that helps and if it does, please accept the solution. Let me know if you have any other questions. Thanks
... View more
11-28-2024
06:22 AM
Hi, Its seems like you are running out of heap memory when adding new attributes through the evaluateJsonPath processor. Attributes are stored in the heap and you should be aware not to store large data in flowfile attributes if you are going to have so many flowfiles in order to avoid running into such issue. Can you please elaborate on what are you trying to accomplish after converting Avro To Json? to me it doesnt make sense what you are doing because you are merging towards the end which means you might not even get the attribute you are extracting depending on how you set the Attribute Strategy in the MergeRecord processor.
... View more
11-28-2024
06:07 AM
Hi, First, if the data you have posted contain real personal info I would recommend to remove and use some dummy data instead. Its violation of community guidelines to post personal information (see point 7 of community guidelines). In regards to the error: you are getting it because of the property setting Quote Character = " in the CSVReader service. What this setting means is that when you have sentence that has once of the reserved CSV characters like comma (,) as column separator and new line (\n) to separate records where you dont\cant use the escape character (\), then you can surround the whole column value with double quotes at both ends. This means you should not have any following character for the same column. For more info please refer to : https://csv-loader.com/csv-guide/why-quotation-marks-are-used-in-csv Since the line you have listed has following characters after the closing " , you are getting the illegal character error. To Resolve: You have two options: 1- Use Replace Text to replace any double quote " character with \" to escape the double quote. However this might not be so efficient if you have large CSV file. 2- More efficient option, is to replace the Quote Character in the CSVReader with something other than " , however you have to make sure that your data is not going to contain the new character in any of the CSV values. Possible options: $,%,^ If this helps please accept the solution. Thanks
... View more
11-27-2024
11:12 AM
1 Kudo
Sure, If you come up with a solution different than what I suggested please do post about it so it can help others who might run into similar situation. good luck
... View more
11-27-2024
07:18 AM
Hi, It still not clear to me what is exactly happening and where. The error message states a field called ecoTaxValues which doesnt seem to exist in the provided input. You also mentioned that you are using ConsumeKafka and getting an error there through reader\write while the consumeKafka processor doesnt take any reader\writer service. The consumeKafkaRecord does....is that what you are using? Please be specific when describing the problem as much as you can. If you cant share the information for security reason then I would recommend you try to reproduce using sample data and dataflow to make it easier to isolate the error. Also please share screenshot\accurate description of the dataflow since the inception of the input and share the processor configurations as well as any services that are being used.
... View more
11-26-2024
02:03 PM
2 Kudos
It seems like whenever dealing with parquet reader\writer services , those services are trying to use Avro schema, possibly to make sense of the data when passing it along to the target processors ( like PutDatabaseRecord ) since parquet is in binary format. The problem with this is that Avro has limitation on how fields should be called. Actually this is reported as a bug in Jira but it doesnt seem to have been resolved. According to the ticket Avro fields should only start with the following characters [A-Za-z_] . Given this , it seems you have to think of some workaround to address this issue since Nifi doesnt provide a solution out of the box. you can check my answer to this post as an option. Basically, you can use python to read the parquet content and transfer to another format (such as CSV as an example) then pass the CSV to the PutDatabaseRecord. This should work as I have tested it. Since you seem to be using Nifi 2.0 , you can develop python extension processor for this instead of ExecuteStreamCommand mentioned in the post. Hope that helps. If it does, please accept the solution. Thanks
... View more
11-26-2024
11:37 AM
1 Kudo
Can you provide more information on your dataflow ? let's say you are using GenerateFlowFile to create the json Kafka output, what happens next? How are you enriching the data and what kind of processor where you are using the json reader\writer service that is causing the error? I need to see the full picture here because When I use same json you provided in GenerateFlowFile processor and then passed it to QueryRecord with the same Json reader\writer service configuration, it seems to be working!
... View more
11-26-2024
10:57 AM
1 Kudo
Hi @PradNiFi1236 , How are you adding the new fields? You Json appears to be invalid as provided.
... View more
11-26-2024
10:52 AM
1 Kudo
Hi , Can you provide more explanation\screenshot of your dataflow and the configuration set on each processor\controller service? Also if you can provide sample data that can be converted to parquet which can then reproduce the error that would be helpful as well. Thanks
... View more
11-25-2024
09:21 AM
Hi , I dont see a function toNumber in the record path syntax , so Im not sure how did you come up with this. It would be helpful next time if you provide the following information: 1- input format. 2- screenshot of the processor configuration causing the error. As for your problem , the easiest and more efficient way - than splitting records- I can think of is using the QueryRecrod processor. lets assume you have the following csv input: id,date_time
1234,2024-11-24 19:43:17
5678,2024-11-24 01:10:10 You can pass the input to the QueryRecord Processor with the following config: The query above is added as a dynamic property which will expose new relationship with the property name that you can use to get the desired output. The query syntax is the following: select id,TIMESTAMPADD(HOUR, -3,date_time) as date_time from flowfile The trick for this to work is how you configure the CSV Reader and Writer to set the expectation on how to parse datetime fields in the reader\writer services: For the CSVReader, Make sure to set the following: CSVRecordSetWriter: Output through Result relationship: id,date_time
1234,2024-11-24 16:43:17
5678,2024-11-23 22:10:10 Hope that helps. If it does, please accept solution. Thanks
... View more