- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
hi, i have a space seprated values file , and i want to select only some coloumns from this flow file then put them in hdfs
- Labels:
-
Apache Hadoop
-
Apache NiFi
Created ‎11-22-2017 03:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i have this form of file as input
*******************************************************************************************************
2017-11-22 16:57:01.770651 IP 192.168.1.5.443 > 10.0.0.11.46250: Flags [P.], seq 1:47, ack 46, win 180, options [nop,nop,TS val 3232053199 ecr 2738373364], length 46
************************************************************************************************************************************
i want to select only date,time,source,destination and then write them back in csv file .
i use this script
reader = csv.reader(open(inputStream,"rb"),delimiter=' ') for row in reader: outputStream.write(str([row[0],row[1],row[3],row[5]])) outputStream.write(str('\n'))
put the output is like
['2017-11-18', '02:09:40.860818', '192.222.1.179.30106', '62.240.110.198.53:']
i want to remove the brackets and th qoutes
Created on ‎11-22-2017 03:58 PM - edited ‎08-17-2019 10:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You could use the ReplaceText processor instead of your script to accomplish what you are trying to do:
The above ReplaceText processor will create 4 capture groups for the desired columns from your input FlowFiles.
It will even work against incoming FlowFiles that have multiple entries (1 per line)
Thank you,
Matt
If you find this answer addresses yoru question/issue, please take a moment to click "Accept" beneath the answer.
Created on ‎11-22-2017 03:58 PM - edited ‎08-17-2019 10:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You could use the ReplaceText processor instead of your script to accomplish what you are trying to do:
The above ReplaceText processor will create 4 capture groups for the desired columns from your input FlowFiles.
It will even work against incoming FlowFiles that have multiple entries (1 per line)
Thank you,
Matt
If you find this answer addresses yoru question/issue, please take a moment to click "Accept" beneath the answer.
Created ‎11-23-2017 12:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for ur response but i have the first 3 values only , can't get the fourth value
Created ‎11-23-2017 09:48 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thank you for your response put the output drop the last column , it only get date , time , src . but don't get the dst
2017-11-23 11:45:25.044084 192.222.1.179.1214
Created on ‎11-23-2017 12:58 PM - edited ‎08-17-2019 10:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think you are missing space in search value property.
Use the below regex in search value property
^(.*?) (.*?) IP (.*?) > (.*?) .*$
(or)
([^\s]+)\s([^\s]+)\sIP\s(.*)\s>\s([^\s]+).*
Use any of the above regex's.
Config:-
