Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

routeoncontent is slow in processing

avatar
Expert Contributor

1. I am getting a stream of messages which have a delimeter ';'

2. I am splitting those messages on ';'

3. those messages are then send to routeonconect, based on some part of text (eg. text containing "one" will be sent to 1st puthdfs. text containing "two" will be sent to 2nd puthdfs processor etc)

4. routed messages are then merged in to single file, using mergecontent.

5. merged file is put in hdfs.

Routeoncontent processor is taking toomuch time to route after stream messages arrive from splitcontent.

I am using match requirement as "content must contain match"

Can anybody help in this case

1 ACCEPTED SOLUTION

avatar

@Hadoop User

How many and what are the size of typical files from the SplitContent processor?

How many Concurrent tasks does the RouteOnContent processor have configured?

Try increasing the Run Duration and/or Concurrent Tasks for better throughput.

View solution in original post

7 REPLIES 7

avatar

@Hadoop User

How many and what are the size of typical files from the SplitContent processor?

How many Concurrent tasks does the RouteOnContent processor have configured?

Try increasing the Run Duration and/or Concurrent Tasks for better throughput.

avatar
Expert Contributor

typically each message from split content processor is <=3KB

concurrent processor are 1.

Also, every second >50000 messages will be received and splitted and sent to route on content processor. I tested it with 50k messages, till route on content it just takes 2-3 second, but after that it is taking almost 3hours!!

I will increase the number of concurrent processors and see, it this helps me to improve the performance

avatar
Expert Contributor

I tried with changing the concurrent processes with 100(for testing), tested with 1k messages, it took 11 minutes to complete.

Any suggestions, please!!

avatar

@Hadoop User

100 concurrent tasks, unless you have a large number of CPUs available, is too many.

Try using 4 concurrent tasks and a run duration of 2 seconds. How long does that take to process the 50k messages?

What does the RouteOnContent processor configuration look like?

avatar
Expert Contributor

I have changed it to 4 concurrent tasks, and run duration of 2s.

for 50k messages it took almost 3 hours (never expected case).

eg: a message will be like below

this_is_an_example_message <1> [some_"text_and_digits_here"_number="121212"] [some_text_here] --similarly 50k messages

routeoncontent configuration:

Scheduling: concurrent tasks: 4

Run Schedule: 2s

Properties: matchrequirement: content must contain match

character set: UTF-8

Content Buffer Size :1MB

txt: number="121212"

update attribute: filename updated here

puthdfs: configurations and path updated here

Thanks in advance

avatar
@Hadoop User

Do you see the data queueing up after the RouteOnContent processor in the flow?

avatar
Expert Contributor

@Wynner

I have replaced RouteOnContent processor, but kept parameters same.

Surprisingly, it works pretty fast(seconds). not sure why the old one was not working.

Thanks for your extended support.