- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to delete a row/drop a column?
- Labels:
-
Apache NiFi
Created ‎11-14-2016 07:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am new to Apache nifi and trying to evaluate tool for given business scenarios.
I ingested csv files using ListFile and FetchFile Processor, now want to perform below transformations on below data:
1. To drop a column from this ingested data.(I can specify column name for the action)
2. To delete a row from the ingested data using specific condition.
3. To perform transformations on null values present in the data
Thanks in advance!!
Regards,
Garima.
Created ‎11-14-2016 07:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The best solution seems to be ... after FetchFile:
- SplitText processor (into single lines), then
- ExtractText processor (delete row) to regex match rows you want to discard, and connect to next processor with unmatched
- ReplaceText processor (drop column) to regex find a column and replace empty value
- Search Value e.g.: (.*,){2} to match the third column
- Replacement Value: ''
- need special attention to delim for first and last columns because there will be only trailing or leading
- ReplaceText processor (transform nulls) to regex find a column and replace empty value
- Search Value regex find all nulls
- Replacement Value: what you want to transfer to
- From here you can merge lines if you need to batch or keep text split as single lines (records) if you want to stream.
A bit challenging on the regex skills, but completely within the realm of regex operations.
See the HCC article for a full working example: https://community.hortonworks.com/articles/66861/nifi-etl-removing-columns-filtering-rows-changing.h...
Created ‎11-14-2016 07:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The best solution seems to be ... after FetchFile:
- SplitText processor (into single lines), then
- ExtractText processor (delete row) to regex match rows you want to discard, and connect to next processor with unmatched
- ReplaceText processor (drop column) to regex find a column and replace empty value
- Search Value e.g.: (.*,){2} to match the third column
- Replacement Value: ''
- need special attention to delim for first and last columns because there will be only trailing or leading
- ReplaceText processor (transform nulls) to regex find a column and replace empty value
- Search Value regex find all nulls
- Replacement Value: what you want to transfer to
- From here you can merge lines if you need to batch or keep text split as single lines (records) if you want to stream.
A bit challenging on the regex skills, but completely within the realm of regex operations.
See the HCC article for a full working example: https://community.hortonworks.com/articles/66861/nifi-etl-removing-columns-filtering-rows-changing.h...
Created ‎11-16-2016 05:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Greg Keys for your quick and crisp replies for all the queries.
- I am able to delete the row using ExtractText Processor but the issue i am facing is getting a single line file ie. If i am having 10 number of rows initially then the transformed output file is having all the rows merged in a single row and thus 10 rows changed to 1 row, with all the data.
- Unfortunately, i am not able to drop the column using Replace Text, as I am not able to get full column values.
For example, if my column header is "uuid" which i need to drop, i put this into regex as : (uuid) and replace text with " ", it is only changing the header name in spite of blanking the full column.
- For null values replacement, i am using ReplaceText processor, properties: regex - (^\s+$) for checking empty values in the excel and then replacing the same using replaceText - "hi" but this is also not working.
- I can merge any number of files using MergeText, but it is appending all the data one after the other, is it feasible to join the data or merge two files on the basis of one common header. for eg. I have uuid column in one file with 5 different cols and the same column in other file with 6 more different col, can i join here both the files and get 12 cols dataset in output file(having one common comparison column). You can treat this case as an equijoin condition.
Many Thanks in advance!
Regards,
Garima.
Created ‎11-16-2016 08:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Garima Verma See the HCC article posted to the main answer. This will resolve your issues. Regarding merging files, I suggest posing this as a separate question in HCC ... it is distinct from your original question, and you will get more eyes on it 🙂
Created ‎11-17-2016 03:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. I will compare both the flows and will try again today.. 🙂
I started a new thread for merging of data :
Great help n support !!
-Garima.
Created ‎11-17-2016 11:52 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Every flow is perfectly working.. thanks a lot..!
Can we check blank space also and replace it with some text value?
i used Search Value as " " but it didnt work.. 😞
Could you please check dis one too once..
Many thanks!
-Garima.
Created ‎11-14-2016 07:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For deleting a column or performing transformations on Null, I would use the UpdateAttribute processor.
If you want to delete a row I'd use the RouteOnAttribute. (I would then route this to HDFS to log that I 'deleted' it.)
I hope this helps.
Created on ‎11-16-2016 05:21 AM - edited ‎08-18-2019 03:24 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Matt Andruff... I tried to drop column using UpdateAttribute but unfortunately still unsuccessful 😞
I think i am missing some config in the processors. Please find attached for details :
and my config for unique attribute is as below wherein "uuid" is my column name i want to drop from ingested data.
Please let me know the required modifications here
Thanks in advance!
Regards,
Garima.
