Support Questions

Find answers, ask questions, and share your expertise

Split a File into multiple files using line number

avatar
Explorer

I Have a file (let's say 1.csv) having below records with 4 lines:

id,name,age,location

1,Shailesh,35,Bangalore

2,Ajay,25,Goa

3,Sanjay,30,Chennai

4,Raman,32,Hyderabad

I want to split the file 1.txt into 2 files (lets say 2.csv and 3.csv) except header using line number and it should be as below:

==============

2.csv----->

1,Shailesh,35,Bangalore

2,Ajay,25,Goa

==================

3.csv---->

3,Sanjay,30,Chennai

4,Raman,32,Hyderabad

======================

Didn't get proper solution. Need some suggestion and help to proceed on this.

8 REPLIES 8

avatar
Master Guru

@Shailesh Bhaskar

Use Query Record processor by configuring Record Reader as CsvReader and Record Writer as CsvSetWriter controller services.

In CsvSetWriter controller service change the property include header line to False.

Add dynamic properties to Query Record processor as

2_csv

select * from Flowfile where id ❤️

3_csv

select * from Flowfile where id >2

Now use 2_csv,3_csv relationships from QueryRecord processor.

Input:

id,name,age,location
1,Shailesh,35,Bangalore
2,Ajay,25,Goa
3,Sanjay,30,Chennai
4,Raman,32,Hyderabad

Output from QueryRecord Processor:

2_csv relation:

1,Shailesh,35,Bangalore
2,Ajay,25,Goa

3_csv relation:

3,Sanjay,30,Chennai
4,Raman,32,Hyderabad

Please refer to this and this links to configure/usage of Query Record processor.

avatar
Explorer

Appreciate your prompt response. Could you please share the template xml to proceed on this. I have done lot of exercise on this with no luck.

avatar
Master Guru
@Shailesh Bhaskar

Sure,i have use generateflowfile processor to create data then use QueryRecord processor and added two dynamic properties.
Reference template query-record-191697.xml

Let me know if you are facing any issues..

avatar
Explorer

I have used the attached template and additionally used putfile processor at last to get all the files into a folder. But I am facing an issue while generating the file. It is trying to generate the files with the same file name for which it is throwing error like file name is existing, so getting the last file based on the line number count split.

Is there a way to get the files in a folder with different-2 names?

Flow is as below:

GetFile-->Queryrecord-->putFile

avatar
Master Guru

@Shailesh Bhaskar

Yes,by using update attribute processor we can change the filenames.

Flow:

72988-flow.png

As shown in the above screenshot use the 2 update attribute processors before PutFile processor and add new property in as

filename

2_csv

72989-updateattribute-filename.png

Same way add new property as

filename

3_csv

in 3_csv relationship feeding update attribute processor.

By changing the filenames like described above will keep filenames every time same.

For the first run there will be no issues because each file will have different filenames, but for the second run if you are not caring about the already stored file in the directory then use Conflict Resolution Strategy as Replace.
if you want to store all the files without any conflicts then use filename property values in update attribute processors as

2_csv_${UUID()}

and

3_csv_${UUID()}

UUID is unique number by using above expression language we are generating unique filename every time and there will be no conflicts.

Refernce flow.xml queryrecord-filenames-191697.xml

avatar
Master Guru

@Shailesh Bhaskar

If the answer addressed your question,Take a moment to Log in and Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues and close this thread.

avatar
Expert Contributor

Query Record Processor in this scenario seems a bit of overkill for this problem. And will require more work if you don't have auto incrementing fields.

You can use just SplitText processor to do everything.

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache...

avatar
Explorer

Hey Umair, Thanks to pointing out. I was trying the same. I was trying with the following flow:

GetFile--->SplitText--->PutFile

Using above case, I am getting only first file using the line number. How to get the other remaining files.

If you have any template to do so then need your help. Please share the template xml file.