Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NiFi Updating header

avatar
Super Collaborator

Hi,

I am trying to update header of my CSV file with a regular expression to remove special chars from header line only. how to do that.?

i tried to do that by reading the file and on one route to RouteText,ReplaceText,ExtractText to get the firstline and storing it in headerline attribute. and another route to move the file without the header and tried to Merge it by using headerline from route1.

But It only shows header in output when i first file arrives from Route 1 into MergeContent processor as it has the headerline property where as if it gets it from Route 2 , the output file doesnt have the headerline as it doent have that property.

any idea how to solve this..?

1 ACCEPTED SOLUTION

avatar
Master Guru

@Saikrishna Tarapareddy

If you are willing to add user defined header without replacing the special chars from header line then
Use ExecuteStreamCommand processor with the below configs

72929-executestreamcommand.png

in this processor we are routing all the lines except the first line i.e we are having the flowfile without header,

then use ReplaceText processor with Prepend as replacement Strategy to add your user defined header to the file.

72936-replacetext.png

Search Value

(?s)(^.*$)

Replacement Value

<user-defined-header>

Character Set

UTF-8

Maximum Buffer Size

1 MB //change the value as per your flowfile size

Replacement Strategy

Prepend

Evaluation Mode

Entire text

By using this method we are not going to have the header line from the file then we are adding the header to the flowfile content by using Replace Text processor.

(or)

instead of using ExecuteStreamCommand processor Use Record Oriented processors(like ConvertRecord) also we can do achieve this case.
Configure/enable csvreader/csvsetwriter as controller services to read the flowfile content and change the Include Header Line value to false in csv setwrtier controller service.

Then use ReplaceText processor to prepend the header by using this method also we need to define header in the replace text processor.

https://community.hortonworks.com/questions/183313/how-to-change-csv-attributeheader-name-in-apache-...

View solution in original post

5 REPLIES 5

avatar
Master Guru

@Saikrishna Tarapareddy

If you are willing to add user defined header without replacing the special chars from header line then
Use ExecuteStreamCommand processor with the below configs

72929-executestreamcommand.png

in this processor we are routing all the lines except the first line i.e we are having the flowfile without header,

then use ReplaceText processor with Prepend as replacement Strategy to add your user defined header to the file.

72936-replacetext.png

Search Value

(?s)(^.*$)

Replacement Value

<user-defined-header>

Character Set

UTF-8

Maximum Buffer Size

1 MB //change the value as per your flowfile size

Replacement Strategy

Prepend

Evaluation Mode

Entire text

By using this method we are not going to have the header line from the file then we are adding the header to the flowfile content by using Replace Text processor.

(or)

instead of using ExecuteStreamCommand processor Use Record Oriented processors(like ConvertRecord) also we can do achieve this case.
Configure/enable csvreader/csvsetwriter as controller services to read the flowfile content and change the Include Header Line value to false in csv setwrtier controller service.

Then use ReplaceText processor to prepend the header by using this method also we need to define header in the replace text processor.

https://community.hortonworks.com/questions/183313/how-to-change-csv-attributeheader-name-in-apache-...

avatar
Master Guru

@Saikrishna Tarapareddy

if you want to replace the special characters in header line then look into the below flow.

Flow:

72931-replace-header.png

So we are splitting the file as line count 1 in SplitText processor.

RouteOnAttribute Configs:

72932-routeonattribute.png

non_header

${fragment.index:gt(1)} //fragment index 1 is the header line.

Use non_header relationship to feed MergeContent processor.

Feed the unmatched relationship to feed replace text processor, now unmatched relationship gets only the fragment.index = 1 flowfile i.e our header is in the flowfile content.

Use Replace text processor:

Now apply your logic to replace the special characters in the flowfile content.

Then feed the success relationship to Merge Content processor.

MergeContent processor Configs:

72933-mergecontent.png

in mergecontent processor use Defragment as MergeStrategy so this processor will wait for all the fragments then does the merge.

Change the Delimiter Strategy to Text and Demarcator to shift+enter.

By following this method we are going to replace only the header line content and wait for all fragments and merge the contents of flowfile.

Reference flow.xml replace-header.xml

-

If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

avatar
Super Collaborator

@Shu,

i thought about this , but the only issue is my files are huge and to split them by lines may not be ideal.

avatar
Master Guru

@Saikrishna Tarapareddy

Try with this approach once

74499-flow.png

In this flow we are forking once the file is pulled and on right side we are going to have all the contents without header

on Left side we are doing head -1 on the flowfile content to get only the header then by using replace text we are going to replacing the special characters.

In Both UpdateAttribute processors we are going to add GroupIdentifier and Order Attribute, so that are going to use these attributes in Enforce Order processor.

By using EnforceOrder Processor we are waiting for header flowfile(left side) to reach first then only we are going to process without header flowfile(right side).

Then change the success queue configurations of EnforceOrder processor prioritizers as FirstInFirstOutPrioritizer .

By using MergeContent Processor to merge the header with the flowfile content.

avatar

@Shu , can you upload xml for recent solution

,

@shu can you upload .xml file here for recent flow