Created on 09-07-2017 07:40 AM - edited 09-16-2022 05:12 AM
Hi folks,
I would like to add a field to csv file based on another field value. For example, I have timestamp field, and I add a date field which I set as trunc(timestamp). Any ideas how could I do this with Nifi? (Afterwards, I am converting this to avro and inserting into Hive)
Thanks.
Created 09-07-2017 01:42 PM
Hi @Sammy Gold, you can do that by using replacetext processor,
1.Follow the steps 1,2 in the below link to extract all your contents of csv file to attributes, https://community.hortonworks.com/questions/131332/nifi-convert-text-file-consisting-of-key-value-pa...
2.To acheive your case, follow step 3 and Use the all existing attribute associated with the flow file and use the same attribute again in ReplaceText Processor for the new field you want to add to the csv file.
EX:- in my case i want to add caluculate field to csv file by using user_agent attribute, for this case i'm using ${user_agent} attribute twice in the replacetext processor replacement value property.
${user_agent}|${user_agent:toUpper()}
Result:
safari|SAFARI
3.Follow the steps 4,5 to convert the new csv file to avro and insert records into Hive.
Created 10-05-2017 02:49 PM
Hi @Shu, thanks for the detailed reply. But I would prefer using Record-based processors (such as SplitRecord and UpdateRecord) since CSV is considered record-based as well. But I am faced with a problem and can't figure it out. I would appreciate if you could suggest a solution or point me in the right direction.
Here is what I have done:
create sample csv file
name,age,height,weight,school james, 19, 173, 56, NY jake, 20, 177, 62, MD sam, 21, 160, 55, KN Mike, 24, 170, 64, MM
created AvroSchemaRegistry
{ "name": "origFormatName", "namespace": "someFields", "type": "record", "fields": [ { "name": "name", "type": "string" }, { "name": "age", "type": "int" }, { "name": "height", "type": "int" }, { "name": "weight", "type": "int" }, { "name": "school", "type": "string" } ] }
Then CSVReader and CSVRecordSetWriter contoller services (referencing the same AvroSchema)
Then using UpdateAttribute I update the file's attribute by adding property schema.name=thename
This successfully adds the attribute. Afterwards, I use SpliteRecord and set the Record Reader and Record Writer property as appropriate (Records Per Split = 1). It does split the file into 4 records as expected.
Then I use UpdateRecord. And again set Record Reader and Record Writer. And for Replacement Value Strategy I set value to Literal Value for simplicity. Then add a property as below
/fields[2]/height = 2222
Then I get the 4 records (as expected) coming from UpdateRecord. But they are all size=0. I can't understand why would it return size 0.
Any help would be appreciated.
Thanks.
Created on 10-08-2017 07:04 PM - edited 08-18-2019 02:42 AM
@Sammy Gold, in updaterecord processor can you change /fields[2]/height to search property to
/height
as shown in the screenshot below, it will replace all the height to 2222 value.
Created 10-09-2017 11:38 AM
I tried that as well, but still the same issue. UpdateRecord takes that split input (4 records), but produces output of size 0 (thought it's still 4 records). Shouldn't UpdateRecord at least produce output without changing anything (for example if it can't find /height in the file?). This is confusing. I am working on this now. I'll update the post as soon as I find solution. Thanks a lot for your time and suggestions.
Created 10-09-2017 12:26 PM
Skipping SplitRecord solved this issue. Now routing directly from UpdateAttribute to UpdateRecord sets the height to desired value. But the main point of my post still not solved -- add field to csv file based on another field value. For example adding a property (inside UpdateRecord) such as heightinmm and setting it to height*10.
From the UpdateRecord manual it says that if the field (property) not present in the record, the UpdateRecord processor creates that field (and the value as intended). But in my case, this didn't work. I think this is due to the schema.name.
Created 10-09-2017 02:11 PM
Yes, it was due to schema. I changed it, and now it works. I have one small question though.
In the record, I have a field height. Now I want to create heightmm: that is height in mm which is equal to height * 10. How can I do this using the UpdateRecord? Because I have two options to choose for property, literal or record path value, I can't do it. It seems like there is no short way of doing this. I am using Nifi version 1.2.0.
Created on 10-10-2017 01:20 AM - edited 08-18-2019 02:42 AM
i tried to add new element to the csv content we can add new field by changing AvroSchemaRegistry to below
{"name": "origFormatName","namespace": "someFields","type": "record","fields": [{ "name": "name", "type": "string" },{ "name": "age", "type": "int" },{ "name": "height", "type": "int" },{ "name": "weight", "type": "int" },{ "name": "school", "type": "string" },{ "name": "heightmm", "type": "int" }]}
but we cannot do math calculations i think..
Following things i tried:-
Replacement value strategy as Record Path value but it wont does any math on the existing /height field
/heightmm as /height*10,/height:multiply(10)
When can this processor works to add values?
It will work if you are going to have heightmm value as /height value then this processor works
Replacement value strategy as Record Path value
/heightmm as /height
Example:-
name,age,height,weight,school,heightmm
james,19,2222,56,,2222 jake,20,2222,62,,2222 sam,21,2222,55,,2222 Mike,24,2222,64,,2222
we can replace only with the existing values, but we cannot do math operations i think.
Another way:-
To add new record for the existing csv data you can extract the height content as attribute by using extract attribute processor and use replace text processor to create new record based on height attribute.
here we are keeping
Replacement Strategy property to Append
it will keeps the content as is and append the new value to the content.
(or)
you can use
Replacement Strategy property to regexreplace
Replacement value as $1${height:multiply(10)}
it will give same result as Append method did.
Created 10-10-2017 05:19 AM
Thanks a lot @Shu. That explained it all.
Created 10-23-2017 10:49 AM
Just wanted to add another way of doing this which occurred to me later.
It is by using the UpdateRecord twice: first for adding another field (column) as the copy of some field in the record; the second one for manipulating that record, such as
/heightmm ${field.value:theFunctionOfUse}
This does the trick, and it works as expected.