Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Matching an input field with multiple regular expressions

avatar
Explorer

Hi...

 

Is there any "compact" way of defining and trying multiple regular expressions on the same field, until one of them matches? The "grok" command requires different field names for different expressions (e.g.,"message1", "message2" and "message3" for 3 different regular expressions, instead of using "message" and trying that field until one of the 3 matches, or all are tried).

 

Of course, one can use tryRules with a different grok command per expression, e.g.:

 

          {
            tryRules {
              rules: [
                {
                  commands: [
                    {
                      grok {
                        dictionaryResources: [...]
                        expressions: {
                          message: """<expression1>"""
                        }
                      }
                    }
                  ]
                }
                {
                  commands: [
                    {
                      grok {
                        dictionaryResources: [conf/etl/grok-dictionaries/patterns]
                        expressions: {
                          message: """<expression2>"""
                        }
                      }
                    }
                  ]
                }

...
{ commands: [ { grok { dictionaryResources: [...] expressions: { message: """<expressionN>""" } } } ] } { commands : [ # No expression matched { dropRecord {} } ] } ] } }

 

The above works, but it has around 5 times more lines than necessary.

 

Any ideas for a more compact notation, with existing morphline commands?

 

Thanks.

 

 

Who agreed with this topic