Support Questions

Find answers, ask questions, and share your expertise

Nifi SplitText split file with 2 records to 10000 and more files

avatar
Contributor

   Hello!

  The configuration of my SplitText is:

Снимок экрана 2024-10-18 в 15.00.11.png

The task is to split one csv file:

   id;description
   "1234";"The latitude is 12324.24"
   "2345";"12324.24 this value"

on 2 files: 

   id;description
   "1234";"The latitude is 12324.24"

and

   id;description
   "2345";12324.24 this value"

But it returns 10000 and more duplicated files!

What am I doing wrong?

 

5 REPLIES 5

avatar
Expert Contributor

Hi @AndreyDE ,

What's your input into the SplitFile processor?

I used your example and getting a valid output - 

drewski7_0-1729273341413.png

Make sure the file going into the SplitText is not re-reading the same file over and over again and also if you are using generateFlowFile make sure the scheduling isn't set to 0 sec because it will keep outputting a bunch of flowfiles. 

Please accept this solution if it's correct, thanks!

avatar
Contributor

My SplitFile processor connected (follows by) ValidateRecords processor.

ValidateRecords use CSVReader with following configuration:

M60Larmp.png

Input_schema is avro:

{
 "type": "record",
 "name": "geo_data",
 "fields": [
    {
      "name": "id",
      "type": [
         "int",
         "null"
       ]
    },
    {
      "name": "description",
        "type": [
         "string",
         "null"
       ]
    }
   ]
  }

And the source of this pipeline in ListS3 and FetchS3 processors.

avatar
Expert Contributor

@AndreyDE 

Is one flowfile going into the SplitText processor and outputting 10000 flowfiles?

How big is the flowfile going into the SplitText processor?

Or is the source of the pipeline recursively getting all objects in your S3 bucket? 

I need to a little bit more about the input going into SplitText? 

avatar
Contributor

    


@drewski7 wrote:

@AndreyDE 

Is one flowfile going into the SplitText processor and outputting 10000 flowfiles?


Yes - one flow file


How big is the flowfile going into the SplitText processor?


About 30 KB


Or is the source of the pipeline recursively getting all objects in your S3 bucket? 

Yes, it searches all objects recursively

 

 

avatar
Super Guru

Hi @AndreyDE ,

The reason you are getting that many flowfiles is because you are continously running the upstream processor that gets the CSV input on  0 Secs Schedule . You seem to be new to Nifi and its typical beginner mistake. we all have been there :). By default the scheduling on every processor is set to 0 secs in earlier version, but in later releases to help avoid getting this issue the default has changed to 1 min.  To fix this, if you are doing testing , I would stop the processor that generates\gets the CSV input and whenever you want to run a test you can right click and select "Run Once". If you are planning to run the flow as batch process where every time you are expecting to git a different file, then go the processor configuration , under Scheduling tab you can adjust the schedule accordingly by selecting either "Timer Schedule" or "Cron Schedule". For more info on scheduling please refer to the following:

https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#scheduling-tab

https://www.youtube.com/watch?v=pZq0EbfDBy4

 

Hop that helps.