Reply
Highlighted
New Contributor
Posts: 2
Registered: ‎07-09-2017

Using regular expression with flume

Hello ,

 

Please consider the following case :
I am using flume to upload a syslog file (COMMA delimited ) into the hdfs.
An impala external table has been created , in order to read this syslog file from hadoop

Once for a while, one of the column VALUE contain a one or more COMMA.
A row that cause a problem can be identified by the following structure:      signatures (1st_string , 2nd_string , 3rd_string....)

For example

1, a , signatures (IP Fragmentation, DNS Amplification)                                 <<<--- PROBLEM
2, b , signatures (IP Fragmentation, NTP Amplification, DNS Amplification)              <<<--- PROBLEM
3, c , signatures (IP Fragmentation, IP Private, NTP Amplification, DNS Amplification)  <<<--- PROBLEM
4, d , signatures (Total Traffic)                                                       <---- NO ISSUE HERE



I cant change the syslog structure , so I would like to  the comma sign in () to ;
REPLACE
For example :
2, b , signatures (IP Fragmentation, NTP Amplification, DNS Amplification)

should become:
2, b , signatures (IP Fragmentation; NTP Amplification; DNS Amplification)

The following Regular expression:  

 

signatures [(].*?[)]   

will find the content that i need to replace.I have followed:
https://flume.apache.org/FlumeUserGuide.html#search-and-replace-interceptor


a1.sources.avroSrc.interceptors = search-replace
a1.sources.avroSrc.interceptors.search-replace.type = search_replace
a1.sources.avroSrc.interceptors.search-replace.searchPattern = signatures [(].*?[)]
a1.sources.avroSrc.interceptors.search-replace.replaceString =


Could you please advise with replaceString setting ?

 

Regards

Announcements