07-17-2017 08:36 AM
Please consider the following case :
I am using flume to upload a syslog file (COMMA delimited ) into the hdfs.
An impala external table has been created , in order to read this syslog file from hadoop
Once for a while, one of the column VALUE contain a one or more COMMA.
A row that cause a problem can be identified by the following structure: signatures (1st_string , 2nd_string , 3rd_string....)
1, a , signatures (IP Fragmentation, DNS Amplification) <<<--- PROBLEM 2, b , signatures (IP Fragmentation, NTP Amplification, DNS Amplification) <<<--- PROBLEM 3, c , signatures (IP Fragmentation, IP Private, NTP Amplification, DNS Amplification) <<<--- PROBLEM 4, d , signatures (Total Traffic) <---- NO ISSUE HERE
I cant change the syslog structure , so I would like to the comma sign in () to ;
For example :
2, b , signatures (IP Fragmentation, NTP Amplification, DNS Amplification)
2, b , signatures (IP Fragmentation; NTP Amplification; DNS Amplification)
The following Regular expression:
will find the content that i need to replace.I have followed:
a1.sources.avroSrc.interceptors = search-replace a1.sources.avroSrc.interceptors.search-replace.type = search_replace a1.sources.avroSrc.interceptors.search-replace.searchPattern = signatures [(].*?[)] a1.sources.avroSrc.interceptors.search-replace.replaceString =
Could you please advise with replaceString setting ?