Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

MergeContent in a cluster

MergeContent in a cluster

Contributor

I have a table with 12M rows coming from Netezza which I need to push it into S3. This is how I have the pipeline setup currently:

GenerateTableFetch->ExecuteSQL->ConvertRecord->MergeContent->PutS3Object

ExecuteSQL & ConvertRecord have Load Balancing turned on.

MergeContent apparently merges data within each data node. How do I combine flowfiles from all Data Nodes into one flowfile before pushing into S3?


3 REPLIES 3
Highlighted

Re: MergeContent in a cluster

Contributor

MergeContent settings. I tried different settings but dont get quite get it to work.


109412-1560803749025.png

Highlighted

Re: MergeContent in a cluster

New Contributor

You can make MergeContent run on only the Primary node. As a result, all of your data will be shuffled over primary and will merge on Primary. Hope it helps.

Highlighted

Re: MergeContent in a cluster

Contributor

Try changing Load Balance Strategy on Channel between ConvertRecord and MergeContent to Single Node.

Don't have an account?
Coming from Hortonworks? Activate your account here