Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

hi, i have a space seprated values file , and i want to select only some coloumns from this flow file then put them in hdfs

avatar

i have this form of file as input

*******************************************************************************************************

2017-11-22 16:57:01.770651 IP 192.168.1.5.443 > 10.0.0.11.46250: Flags [P.], seq 1:47, ack 46, win 180, options [nop,nop,TS val 3232053199 ecr 2738373364], length 46

************************************************************************************************************************************


i want to select only date,time,source,destination and then write them back in csv file .

i use this script

reader = csv.reader(open(inputStream,"rb"),delimiter=' ')
            for row in reader:                         
                   outputStream.write(str([row[0],row[1],row[3],row[5]]))
                   outputStream.write(str('\n'))

put the output is like

['2017-11-18', '02:09:40.860818', '192.222.1.179.30106', '62.240.110.198.53:']

i want to remove the brackets and th qoutes

1 ACCEPTED SOLUTION

avatar
Super Mentor

@Mohamed Hossam

You could use the ReplaceText processor instead of your script to accomplish what you are trying to do:

42719-screen-shot-2017-11-22-at-105111-am.png


The above ReplaceText processor will create 4 capture groups for the desired columns from your input FlowFiles.
It will even work against incoming FlowFiles that have multiple entries (1 per line)

Thank you,

Matt

If you find this answer addresses yoru question/issue, please take a moment to click "Accept" beneath the answer.

View solution in original post

4 REPLIES 4

avatar
Super Mentor

@Mohamed Hossam

You could use the ReplaceText processor instead of your script to accomplish what you are trying to do:

42719-screen-shot-2017-11-22-at-105111-am.png


The above ReplaceText processor will create 4 capture groups for the desired columns from your input FlowFiles.
It will even work against incoming FlowFiles that have multiple entries (1 per line)

Thank you,

Matt

If you find this answer addresses yoru question/issue, please take a moment to click "Accept" beneath the answer.

avatar

@Matt Clarke

Thanks for ur response but i have the first 3 values only , can't get the fourth value

avatar

@Matt Clarke

thank you for your response put the output drop the last column , it only get date , time , src . but don't get the dst

2017-11-23 11:45:25.044084  192.222.1.179.1214 

avatar
Master Guru

@Mohamed Hossam

I think you are missing space in search value property.

Use the below regex in search value property

^(.*?) (.*?) IP (.*?) > (.*?) .*$

(or)

([^\s]+)\s([^\s]+)\sIP\s(.*)\s>\s([^\s]+).*

Use any of the above regex's.

Config:-

42735-replace.png