Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What is a good way of parsing regex rules against incoming logs during spark streaming ?

Highlighted

What is a good way of parsing regex rules against incoming logs during spark streaming ?

New Contributor

I have around 1000 to 2000 rules to be parsed against 50000 log lines coming into spark streaming pipeline.Currently i did a small test against 500 rules which took lot of time with 3 node cluster (each node has 8GB ram and 2 cores).It takes around 30 mins to parse just 500 rules.How should i reduce the time to seconds since its a streaming job ?

Don't have an account?
Coming from Hortonworks? Activate your account here