Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Splitting a Nifi flowfile into multiple flowfiles

Solved Go to solution

Splitting a Nifi flowfile into multiple flowfiles

Rising Star

Hi All,

I have the following requirement:

Split a single NiFi flowfile into multiple flowfiles, eventually to insert the contents (after extracting the contents from the flowfile) of each of the flowfiles as a separate row in a Hive table.

Sample input flowfile:

MESSAGE_HEADER | A | B | C

LINE|1 | ABCD | 1234

LINE|2 | DEFG | 5678

LINE|3 | HIJK | 9012

.

.

.

Desired output files:

Flowfile 1:

MESSAGE_HEADER | A | B | C

LINE|1 | ABCD | 1234

Flowfile 2:

MESSAGE_HEADER | A | B | C

LINE|2 | DEFG | 5678

Flowfile 3:

MESSAGE_HEADER | A | B | C

LINE|3 | HIJK | 9012

.

.

.

The number of lines in the flowfile is not known ahead of time.

I would like to know what's the best way to accomplish this with the different NiFi processors that are available; The splitting can be done at the flowfile level or after the contents of the flowfile are extracted out of the flowfile, but before Hive insert statements are created.

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Splitting a Nifi flowfile into multiple flowfiles

@Raj B The SplitText processor has a "Header Line Count" property. If you set this to 1, you should be able to achieve what you want in generating multiple flow files, each with the same header. That said, if you're intending to insert these into Hive, you could actually use ConvertCSVToAvro too, setting the delimiter to '|' and then you'd have the data in batches which should give you better throughput.

4 REPLIES 4

Re: Splitting a Nifi flowfile into multiple flowfiles

@Raj B The SplitText processor has a "Header Line Count" property. If you set this to 1, you should be able to achieve what you want in generating multiple flow files, each with the same header. That said, if you're intending to insert these into Hive, you could actually use ConvertCSVToAvro too, setting the delimiter to '|' and then you'd have the data in batches which should give you better throughput.

Re: Splitting a Nifi flowfile into multiple flowfiles

Rising Star

@jfrazee Thank you; I'm going the SplitText route for now, it seems to work;

for the purposes of saving the split files, for later reference, how do I assign different names (I'm thinking may be pre or postpend UUID to the file name) to the child/split flowfiles; when I looked at it, all of the child files are getting the same name as the parent flowfile, which is causing child flowfiles to be overwritten.

Highlighted

Re: Splitting a Nifi flowfile into multiple flowfiles

Contributor

@jfrazee @Raj B

how did you save it in file? Getfile -> splitText -> PutFile ?

Re: Splitting a Nifi flowfile into multiple flowfiles

Rising Star

@mel mendoza, in my case, after splitting the files, I was doing further processing on the split files; but if your requirement is to store/write the split files, you could use PutFile or PutHDFS to write to local file system or HDFS.

Don't have an account?
Coming from Hortonworks? Activate your account here