Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

[Apache NiFi Bug] ExecuteSQL can't perform batch loading with an input file

[Apache NiFi Bug] ExecuteSQL can't perform batch loading with an input file

New Contributor

Hello!

TC to reproduce:

1) Create GenerateFlowFile with the next Custom Text:

select 1 from dual
union all
select 2 from dual
union all
select 3 from dual

2) Create ExecuteSQL. Set "Output Batch Size" = 1.
3) Start

AR:

nifi-bug.png

ER:

3 flowfiles in the "success" connection.

Root Cause:

if (outputBatchSize > 0 && resultSetFlowFiles.size() >= outputBatchSize) {
    session.transfer(resultSetFlowFiles, REL_SUCCESS);
    session.commit();
    resultSetFlowFiles.clear();
}

Commit method has the next requirement:
Commits the current session ensuring all operations against FlowFiles within this session are atomically persisted. All FlowFiles operated on within this session must be accounted for by transfer or removal or the commit will fail.

At the time of any commit, an input FlowFile (fileToProcess) still exists and has not been accounted for by transfer or removal.

How can it be worked around? We have the same issue with the custom processor.

1 REPLY 1

Re: [Apache NiFi Bug] ExecuteSQL can't perform batch loading with an input file

I'm not aware of any workaround, but I did submit a fix for the Apache Jira you wrote (NIFI-6040), it should be in an upcoming HDF release. For your custom processor, you could make the same changes I did in my Pull Request, to remove the incoming flow file before the first commit.

Don't have an account?
Coming from Hortonworks? Activate your account here