Created 11-14-2016 07:07 AM
I am new to Apache nifi and trying to evaluate tool for given business scenarios.
I ingested csv files using ListFile and FetchFile Processor, now want to perform below transformations on below data:
1. To drop a column from this ingested data.(I can specify column name for the action)
2. To delete a row from the ingested data using specific condition.
3. To perform transformations on null values present in the data
Thanks in advance!!
Regards,
Garima.
Created 11-14-2016 07:05 PM
The best solution seems to be ... after FetchFile:
A bit challenging on the regex skills, but completely within the realm of regex operations.
See the HCC article for a full working example: https://community.hortonworks.com/articles/66861/nifi-etl-removing-columns-filtering-rows-changing.h...
Created 11-14-2016 07:05 PM
The best solution seems to be ... after FetchFile:
A bit challenging on the regex skills, but completely within the realm of regex operations.
See the HCC article for a full working example: https://community.hortonworks.com/articles/66861/nifi-etl-removing-columns-filtering-rows-changing.h...
Created 11-16-2016 05:33 AM
Thanks @Greg Keys for your quick and crisp replies for all the queries.
- I am able to delete the row using ExtractText Processor but the issue i am facing is getting a single line file ie. If i am having 10 number of rows initially then the transformed output file is having all the rows merged in a single row and thus 10 rows changed to 1 row, with all the data.
- Unfortunately, i am not able to drop the column using Replace Text, as I am not able to get full column values.
For example, if my column header is "uuid" which i need to drop, i put this into regex as : (uuid) and replace text with " ", it is only changing the header name in spite of blanking the full column.
- For null values replacement, i am using ReplaceText processor, properties: regex - (^\s+$) for checking empty values in the excel and then replacing the same using replaceText - "hi" but this is also not working.
- I can merge any number of files using MergeText, but it is appending all the data one after the other, is it feasible to join the data or merge two files on the basis of one common header. for eg. I have uuid column in one file with 5 different cols and the same column in other file with 6 more different col, can i join here both the files and get 12 cols dataset in output file(having one common comparison column). You can treat this case as an equijoin condition.
Many Thanks in advance!
Regards,
Garima.
Created 11-16-2016 08:37 PM
@Garima Verma See the HCC article posted to the main answer. This will resolve your issues. Regarding merging files, I suggest posing this as a separate question in HCC ... it is distinct from your original question, and you will get more eyes on it 🙂
Created 11-17-2016 03:20 AM
Thanks. I will compare both the flows and will try again today.. 🙂
I started a new thread for merging of data :
Great help n support !!
-Garima.
Created 11-17-2016 11:52 AM
Every flow is perfectly working.. thanks a lot..!
Can we check blank space also and replace it with some text value?
i used Search Value as " " but it didnt work.. 😞
Could you please check dis one too once..
Many thanks!
-Garima.
Created 11-14-2016 07:38 PM
For deleting a column or performing transformations on Null, I would use the UpdateAttribute processor.
If you want to delete a row I'd use the RouteOnAttribute. (I would then route this to HDFS to log that I 'deleted' it.)
I hope this helps.
Created on 11-16-2016 05:21 AM - edited 08-18-2019 03:24 AM
Thanks @Matt Andruff... I tried to drop column using UpdateAttribute but unfortunately still unsuccessful 😞
I think i am missing some config in the processors. Please find attached for details :
and my config for unique attribute is as below wherein "uuid" is my column name i want to drop from ingested data.
Please let me know the required modifications here
Thanks in advance!
Regards,
Garima.