Support Questions

Find answers, ask questions, and share your expertise

add field to csv file based on other field value

avatar
Contributor

Hi folks,

I would like to add a field to csv file based on another field value. For example, I have timestamp field, and I add a date field which I set as trunc(timestamp). Any ideas how could I do this with Nifi? (Afterwards, I am converting this to avro and inserting into Hive)

Thanks.

9 REPLIES 9

avatar
Master Guru

Hi @Sammy Gold, you can do that by using replacetext processor,

1.Follow the steps 1,2 in the below link to extract all your contents of csv file to attributes, https://community.hortonworks.com/questions/131332/nifi-convert-text-file-consisting-of-key-value-pa...

2.To acheive your case, follow step 3 and Use the all existing attribute associated with the flow file and use the same attribute again in ReplaceText Processor for the new field you want to add to the csv file.

EX:- in my case i want to add caluculate field to csv file by using user_agent attribute, for this case i'm using ${user_agent} attribute twice in the replacetext processor replacement value property.

${user_agent}|${user_agent:toUpper()}

Result:

safari|SAFARI

3.Follow the steps 4,5 to convert the new csv file to avro and insert records into Hive.

avatar
Contributor

Hi @Shu, thanks for the detailed reply. But I would prefer using Record-based processors (such as SplitRecord and UpdateRecord) since CSV is considered record-based as well. But I am faced with a problem and can't figure it out. I would appreciate if you could suggest a solution or point me in the right direction.

Here is what I have done:

create sample csv file

name,age,height,weight,school 
james, 19, 173, 56, NY
jake, 20, 177, 62, MD 
sam, 21, 160, 55, KN
Mike, 24, 170, 64, MM

created AvroSchemaRegistry

{
  "name": "origFormatName",
  "namespace": "someFields",
  "type": "record",
  "fields": [
    { "name": "name", "type": "string" },
    { "name": "age", "type": "int" },
    { "name": "height", "type": "int" },
    { "name": "weight", "type": "int" },
    { "name": "school", "type": "string" }
  ]
}

Then CSVReader and CSVRecordSetWriter contoller services (referencing the same AvroSchema)

Then using UpdateAttribute I update the file's attribute by adding property schema.name=thename
This successfully adds the attribute. Afterwards, I use SpliteRecord and set the Record Reader and Record Writer property as appropriate (Records Per Split = 1). It does split the file into 4 records as expected.
Then I use UpdateRecord. And again set Record Reader and Record Writer. And for Replacement Value Strategy I set value to Literal Value for simplicity. Then add a property as below

/fields[2]/height = 2222

Then I get the 4 records (as expected) coming from UpdateRecord. But they are all size=0. I can't understand why would it return size 0.

Any help would be appreciated.

Thanks.

avatar
Master Guru

@Sammy Gold, in updaterecord processor can you change /fields[2]/height to search property to

/height 

as shown in the screenshot below, it will replace all the height to 2222 value.

39711-screen-shot-2017-10-08-at-30012-pm.png

avatar
Contributor
@Shu

I tried that as well, but still the same issue. UpdateRecord takes that split input (4 records), but produces output of size 0 (thought it's still 4 records). Shouldn't UpdateRecord at least produce output without changing anything (for example if it can't find /height in the file?). This is confusing. I am working on this now. I'll update the post as soon as I find solution. Thanks a lot for your time and suggestions.

avatar
Contributor

Skipping SplitRecord solved this issue. Now routing directly from UpdateAttribute to UpdateRecord sets the height to desired value. But the main point of my post still not solved -- add field to csv file based on another field value. For example adding a property (inside UpdateRecord) such as heightinmm and setting it to height*10.

From the UpdateRecord manual it says that if the field (property) not present in the record, the UpdateRecord processor creates that field (and the value as intended). But in my case, this didn't work. I think this is due to the schema.name.

avatar
Contributor

Yes, it was due to schema. I changed it, and now it works. I have one small question though.

In the record, I have a field height. Now I want to create heightmm: that is height in mm which is equal to height * 10. How can I do this using the UpdateRecord? Because I have two options to choose for property, literal or record path value, I can't do it. It seems like there is no short way of doing this. I am using Nifi version 1.2.0.

avatar
Master Guru

@Sammy Gold,

i tried to add new element to the csv content we can add new field by changing AvroSchemaRegistry to below

{"name": "origFormatName","namespace": "someFields","type": "record","fields": [{ "name": "name", "type": "string" },{ "name": "age", "type": "int" },{ "name": "height", "type": "int" },{ "name": "weight", "type": "int" },{ "name": "school", "type": "string" },{ "name": "heightmm", "type": "int" }]} 

but we cannot do math calculations i think..

Following things i tried:-

Replacement value strategy as Record Path value but it wont does any math on the existing /height field

/heightmm as /height*10,/height:multiply(10)

When can this processor works to add values?

It will work if you are going to have heightmm value as /height value then this processor works

Replacement value strategy as Record Path value

/heightmm as /height

Example:-

name,age,height,weight,school,heightmm 
james,19,2222,56,,2222 jake,20,2222,62,,2222 sam,21,2222,55,,2222 Mike,24,2222,64,,2222

we can replace only with the existing values, but we cannot do math operations i think.

Another way:-

To add new record for the existing csv data you can extract the height content as attribute by using extract attribute processor and use replace text processor to create new record based on height attribute.

40763-screenshot.png

here we are keeping

Replacement Strategy property to Append 

it will keeps the content as is and append the new value to the content.

(or)

you can use

Replacement Strategy property to regexreplace
Replacement value as $1${height:multiply(10)} 

it will give same result as Append method did.

avatar
Contributor

Thanks a lot @Shu. That explained it all.

avatar
Contributor

Just wanted to add another way of doing this which occurred to me later.

It is by using the UpdateRecord twice: first for adding another field (column) as the copy of some field in the record; the second one for manipulating that record, such as

/heightmm		${field.value:theFunctionOfUse}

This does the trick, and it works as expected.