Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NiFi size based File Split

NiFi size based File Split

New Contributor

Hi ,

I hava use case to split a large file to 0.5 GB files,I was able to slit the file to 0.5 GB but thw split is not record oriented.I see records splitting in the middle.

E.g.

Original File:

abc|12324|abc|1234

aaa|12324|abc|1234

ccc|12324|abc|1234

ddd|12324|abc|1234

Split File1

abc|12324|abc|1234

aaa|12324|

Split File2:

abc|1234

ccc|12324|abc|1234

ddd|12324|abc|1234

I am using split text using Split Text Processor,I have attached the screnshot below.

What am I doing wrong? Can anyone direct me to examples / templates?

Thanks!

Hemanth

93758-capture.jpg

1 REPLY 1
Highlighted

Re: NiFi size based File Split

Super Guru
@Hemanth Vakacharla

i think for this case we need to split the records one line each by using SplitRecord/SplitText processor.

Then Using MergeContent processor we can do 500 MB splits by using this way we are not going to have splitting records in between.

Flow:

1.SplitRecord/SplitText //split the flowfile 1 line each
2.MergeRecord/MergeContent //to get 500MB filesize

93768-screen-shot-2018-12-01-at-30055-pm.png

To force merge flowfiles use MaxBigAge property like 30 mins..etc.

In case if you are using Record oriented processors we need to define Record Writer/Reader with avro schema to read/write the flowfile.

Refer to this link for more details regards to merge content processor.

Don't have an account?
Coming from Hortonworks? Activate your account here