Created 01-25-2022 12:30 AM
There have some rows data with header in txt file, like this:
test_a|test_b|test_c|test_d|test_e
a|b|3.0|4.0|5.0
a|b|3.0|4.0|5.0
a|b|3.0|4.0|5.0
now , i want remove the value after test_c and test_d decimal point, the result change this:
test_a|test_b|test_c|test_d|test_e
a|b|3|4|5.0
a|b|3|4|5.0
a|b|3|4|5.0
how could i do? thanks.
Created 01-26-2022 06:25 AM
Hello,
Most likely because on your CSV Reader you have:
Treat First Line as Header = false ( default )
Change that to true
Created 01-25-2022 11:27 AM
@zhangliang to accomplish that i would use UpdateRecord
Since your data is csv and structured we can use record manipulation to accomplish this.
First I would treat all your values as string and build an avro schema to use:
{
"type":"record",
"name":"nifiRecord",
"namespace":"org.apache.nifi",
"fields":[
{"name":"test_a","type":["null","string"]},
{"name":"test_b","type":["null","string"]},
{"name":"test_c","type":["null","string"]},
{"name":"test_d","type":["null","string"]},
{"name":"test_e","type":["null","string"]}
]
}
Then I would configure my UpdateRecord to use a CSV Reader and a CSV Writer
I would configure the CSV Reader like this:
Use schema text property
Schema Text = Put your avro schema there
Value Separator = |
And the CSV Writer leave everything default except:
Value Separator = |
Finally the UpdateRecord processor will need 2 user fields.
In this case we want to update the fields "test_c" and "test_d"
And then we can use Record path manipulation and in particular for this use case the substringBefore function to only give us everything before the DOT "."
Here is what you should configure:
This will then take an input like this:
test_a|test_b|test_c|test_d|test_e
a|b|3.0|4.0|5.0
a|b|3.0|4.0|5.0
a|b|3.0|4.0|5.0
and produce an output like this:
test_a|test_b|test_c|test_d|test_e
a|b|3|4|5.0
a|b|3|4|5.0
a|b|3|4|5.0
Created 01-25-2022 05:48 PM
thank you advice , I use you design, but i get the result like this, it has two row header:
test_a|test_b|test_c|test_d|test_e
test_a|test_b|test_c|test_d|test_e
a|b|3|4|5.0
a|b|3|4|5.0
a|b|3|4|5.0
hou could i do, thanks
Created 01-26-2022 06:25 AM
Hello,
Most likely because on your CSV Reader you have:
Treat First Line as Header = false ( default )
Change that to true