Created 07-19-2024 06:11 AM
We have a requirement to convert the CSV file content into a custom format. Please find the details of the input and expected output format below and help us with your expertise to achieve this using the NiFi Processors.
Input File Content :
username, first name, middle name, last name
test_user, test_FN, test_MN, test_LN
test_user2, test2_FN, test2_MN, test2_LN
Expected Output Format :
username = test_user
first name = test_FN
middle name = test_MN
last name = test_LN
username = test_user2
first name = test2_FN
middle name = test2_MN
last name = test2_LN
Created 07-19-2024 12:57 PM
@NagendraKumar
Often times there is more then 1 may to solution a use case.
Here is one possible solution:
NiFi Components used:
SplitRecord Used to split your multi-row CSV record in to individual records.
This processor will use a CSVReader:
and CSVRecordSetWriter:
The "Splits" relationship then gets routed to a ReplaceText processor (used to reformat the individual line record):
"Search Value" based on four items per line (header and body):
^(.*?),(.*?),(.*?),(.*?)[\r\n]+(.*?),(.*?),(.*?),(.*?)[\r\n]+
"Replacement Value":
The "Success" relationship is then routed to a MergeContent processor (used to recombine the original multi-records into a single FlowFile):
Note: Demarcator is configured with line return to provide a new line between records in content.
The assemble portion of this dataflow looks like this:
Above is a working solution based on your shared example. It works no matter how many CSV rows exist in the source file.
Other possibilities:
I feel like this use case could also be accomplished using maybe the ScriptedTransformRecord processor. I am just not sure myself on how to write the scripted needed here correctly. Perhaps others in the community have suggestions.
Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 07-20-2024 05:51 PM
Hi,
Another option is to use FreeFormTextRecrodSetWriter for that. The documentation is lacking about this unfortunately but you can find some examples like this if you google it.
All you need is a ConvertRecord processor to get the desired result. here is an example :
- GenerateFlowFile is to simulate generating the csv input:
- ConvertRecord: which takes the CSV input using CSVReader and Record Writer using the FreeFormTextRecordSetWriter:
- CSVReader Service Configuration: You can use default configuration.
-FreeFormTextRecordSetWriter:
The Text used in the above Service to provide desired output:
username = ${username}
first name = ${"first name"}
middle name = ${"middle name"}
last name = ${"last name"}
Output:
username = test_user
first name = test_FN
middle name = test_MN
last name = test_LN
username = test_user2
first name = test2_FN
middle name = test2_MN
last name = test2_LN
Hope that helps.
Created 07-19-2024 12:57 PM
@NagendraKumar
Often times there is more then 1 may to solution a use case.
Here is one possible solution:
NiFi Components used:
SplitRecord Used to split your multi-row CSV record in to individual records.
This processor will use a CSVReader:
and CSVRecordSetWriter:
The "Splits" relationship then gets routed to a ReplaceText processor (used to reformat the individual line record):
"Search Value" based on four items per line (header and body):
^(.*?),(.*?),(.*?),(.*?)[\r\n]+(.*?),(.*?),(.*?),(.*?)[\r\n]+
"Replacement Value":
The "Success" relationship is then routed to a MergeContent processor (used to recombine the original multi-records into a single FlowFile):
Note: Demarcator is configured with line return to provide a new line between records in content.
The assemble portion of this dataflow looks like this:
Above is a working solution based on your shared example. It works no matter how many CSV rows exist in the source file.
Other possibilities:
I feel like this use case could also be accomplished using maybe the ScriptedTransformRecord processor. I am just not sure myself on how to write the scripted needed here correctly. Perhaps others in the community have suggestions.
Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 07-21-2024 11:21 PM
Thanks a lot @MattWho for the detailed explanation. This solution should work, but we have a huge volume of records, and splitting and merging might be costly. But I will consider these processors for my other requirements. Thanks!
Created 07-20-2024 05:51 PM
Hi,
Another option is to use FreeFormTextRecrodSetWriter for that. The documentation is lacking about this unfortunately but you can find some examples like this if you google it.
All you need is a ConvertRecord processor to get the desired result. here is an example :
- GenerateFlowFile is to simulate generating the csv input:
- ConvertRecord: which takes the CSV input using CSVReader and Record Writer using the FreeFormTextRecordSetWriter:
- CSVReader Service Configuration: You can use default configuration.
-FreeFormTextRecordSetWriter:
The Text used in the above Service to provide desired output:
username = ${username}
first name = ${"first name"}
middle name = ${"middle name"}
last name = ${"last name"}
Output:
username = test_user
first name = test_FN
middle name = test_MN
last name = test_LN
username = test_user2
first name = test2_FN
middle name = test2_MN
last name = test2_LN
Hope that helps.
Created 07-21-2024 11:18 PM
Thanks a lot @SAMSAL This solution worked.
Created 07-24-2024 06:03 AM
Hi @SAMSAL Good Day! We can generate the data in the required format using the FreeFormTextRecordSetWriter processor. As the next step, we need to convert this into parquet and store them in HDFS. We use the "ConvertRecord" processor to prepare the parquet format but do not have the FreeFormTextRecordSetReader. If we use CSV or some other reader, the output gets misaligned. So, Please help us with your expertise to convert the data from the FreeFormTextRecordSetWriter into parquet format and store them in the HDFS location.
I appreciate any help you can provide.