Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Using regular expression with flume

Highlighted

Using regular expression with flume

New Contributor

Hello ,

 

Please consider the following case :
I am using flume to upload a syslog file (COMMA delimited ) into the hdfs.
An impala external table has been created , in order to read this syslog file from hadoop

Once for a while, one of the column VALUE contain a one or more COMMA.
A row that cause a problem can be identified by the following structure:      signatures (1st_string , 2nd_string , 3rd_string....)

For example

1, a , signatures (IP Fragmentation, DNS Amplification)                                 <<<--- PROBLEM
2, b , signatures (IP Fragmentation, NTP Amplification, DNS Amplification)              <<<--- PROBLEM
3, c , signatures (IP Fragmentation, IP Private, NTP Amplification, DNS Amplification)  <<<--- PROBLEM
4, d , signatures (Total Traffic)                                                       <---- NO ISSUE HERE



I cant change the syslog structure , so I would like to  the comma sign in () to ;
REPLACE
For example :
2, b , signatures (IP Fragmentation, NTP Amplification, DNS Amplification)

should become:
2, b , signatures (IP Fragmentation; NTP Amplification; DNS Amplification)

The following Regular expression:  

 

signatures [(].*?[)]   

will find the content that i need to replace.I have followed:
https://flume.apache.org/FlumeUserGuide.html#search-and-replace-interceptor


a1.sources.avroSrc.interceptors = search-replace
a1.sources.avroSrc.interceptors.search-replace.type = search_replace
a1.sources.avroSrc.interceptors.search-replace.searchPattern = signatures [(].*?[)]
a1.sources.avroSrc.interceptors.search-replace.replaceString =


Could you please advise with replaceString setting ?

 

Regards