Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Split FlowFiles to Multiple ConvertJsonToSQL Processors

Solved Go to solution
Highlighted

Split FlowFiles to Multiple ConvertJsonToSQL Processors

Explorer

Hi,

I have 100K flowfiles generated by custom processor and i need to store them to mySQL DB, I need to process the 100k flowfiles by multiple ConvertJsonToSQL Processors concurrently to speed the insertion process. what is the processor that i should use between the custom processor and ConvertJsonToSQL processors (4 ConvertJsonToSQL processors) in order to achieve that.

Thanks,,

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Split FlowFiles to Multiple ConvertJsonToSQL Processors

Super Guru

@yazeed salem

  • If your flowfile content is already in json format and each message/record are in one line then use Split Text processor with split line count of <desired number>
  • If your flowfile content is already in json and each message are not in one line then use Split Record processor and configure record reader/writer controller services(define matching avro schema to the incoming flowfile content), change the records per split property as your desired number. using Split record processor will be efficient as the processor works with chunks of data.

Refer to this and this links to configure Record Reader/Writer Controller services.

Flow:

1.Custom processor2.SplitRecord/SplitText processors3.DistributeLoad
4.ConvertJsonToSQL

77638-flow.png

DistributeLoad Configs:

77639-dl.png

Number of Relationships1Determines the number of Relationships to which the load should be distributed
Distribution Strategyround robin
  • round robin
  • next available
  • load distribution service
Determines how the load will be distributed. If using Round Robin, will not distribute any FlowFiles unless all destinations can accept FlowFiles; when using Next Available, will distribute FlowFiles as long as at least 1 destination can accept FlowFiles.
As i have configured Number of Relationships to 3 then connected
  • 1 relationship from Distribute Load processor to first ConvertJsonToSql processor
  • 2 to second ConvertJsonToSql processor
  • 3 relationship to third ConvertJsonToSql processor

Based on the number of splits that you want change the configs in Distribute Load processor and add more ConvertJsonToSQL processor.

In addition please consider using record oriented PutDatabaseRecord processor which works on chunks of data, Configure the Record Reader controller service to read the incoming flowfile, then i think you don't have to split any records also.

Flow:

1.Custom Processor
2.PutDatabaseRecord

-

If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

2 REPLIES 2

Re: Split FlowFiles to Multiple ConvertJsonToSQL Processors

Super Guru

@yazeed salem

  • If your flowfile content is already in json format and each message/record are in one line then use Split Text processor with split line count of <desired number>
  • If your flowfile content is already in json and each message are not in one line then use Split Record processor and configure record reader/writer controller services(define matching avro schema to the incoming flowfile content), change the records per split property as your desired number. using Split record processor will be efficient as the processor works with chunks of data.

Refer to this and this links to configure Record Reader/Writer Controller services.

Flow:

1.Custom processor2.SplitRecord/SplitText processors3.DistributeLoad
4.ConvertJsonToSQL

77638-flow.png

DistributeLoad Configs:

77639-dl.png

Number of Relationships1Determines the number of Relationships to which the load should be distributed
Distribution Strategyround robin
  • round robin
  • next available
  • load distribution service
Determines how the load will be distributed. If using Round Robin, will not distribute any FlowFiles unless all destinations can accept FlowFiles; when using Next Available, will distribute FlowFiles as long as at least 1 destination can accept FlowFiles.
As i have configured Number of Relationships to 3 then connected
  • 1 relationship from Distribute Load processor to first ConvertJsonToSql processor
  • 2 to second ConvertJsonToSql processor
  • 3 relationship to third ConvertJsonToSql processor

Based on the number of splits that you want change the configs in Distribute Load processor and add more ConvertJsonToSQL processor.

In addition please consider using record oriented PutDatabaseRecord processor which works on chunks of data, Configure the Record Reader controller service to read the incoming flowfile, then i think you don't have to split any records also.

Flow:

1.Custom Processor
2.PutDatabaseRecord

-

If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Re: Split FlowFiles to Multiple ConvertJsonToSQL Processors

Explorer

Thanks it works.

Don't have an account?
Coming from Hortonworks? Activate your account here