Support Questions

Find answers, ask questions, and share your expertise

Update CSV attribute/Merge CSV files

avatar
Expert Contributor

Hi

I have multiple csv files where each file contains an attribute value at a given time. There are a total of 60 files (60 different attributes). These are basically Spark's Metric Dump. example:

The file name is the name of the application followed by the attribute name.

For the example below the application name is :local-1495979652246 and attribute for the first file is: BlockManager.disk.diskSpaceUsed_MB

local-1495979652246.driver.BlockManager.disk.diskSpaceUsed_MB.csv

local-1495979652246.driver.BlockManager.memory.maxMem_MB.csv

local-1495979652246.driver.BlockManager.memory.memUsed_MB.csv

each file contains values like:

tvalue
14965881670.003329809088456
14965881680.00428465362778284

The file name specifys the name of the attribute.

The first thing i need to do is to update csv header field called value to the attribute name from the filename

t

BlockManager.disk.diskSpaceUsed_MB

14965881670.003329809088456

the next thing would be to merge all files for the same application based on the value of the filed t. and eventually I should have one csv file for each application containing the values for all the attributes like:

tBlockManager.disk.diskSpaceUsed_MBBlockManager.memory.maxMem_MBBlockManager.memory.memUsed_MBmore attributes...
14965881670.003329809088456some valuesome valuesome value
14965881680.00428465362778284some valuecome value..

any suggestions?

1 ACCEPTED SOLUTION

avatar

Hello @Arsalan Siddiqi

I think it's possible with NiFi 1.2.0 or later, by using QueryRecord processor.

The basic idea is using MergeContent to create a single FlowFile, containing all CSV files, when doing so, add a column specifying which attribute (I used 'm' column in my example).

Then use QueryRecord processor to join records and produce a row having different attributes in it:

select fa.t, fa.v a, fb.v b, fc.v c
 from (
   select t, v from FLOWFILE where m = 'a'
 ) fa
 left join (
   select t, v from FLOWFILE where m = 'b'
 ) fb on fa.t = fb.t
 left join (
   select t, v from FLOWFILE where m = 'c'
 ) fc on fa.t = fc.t

I've created a Gist with NiFi flow template, I hope this helps:

https://gist.github.com/ijokarumawak/7e20af1cd222fb2adf13acb2b0f46aed

View solution in original post

3 REPLIES 3

avatar

Hello @Arsalan Siddiqi

I think it's possible with NiFi 1.2.0 or later, by using QueryRecord processor.

The basic idea is using MergeContent to create a single FlowFile, containing all CSV files, when doing so, add a column specifying which attribute (I used 'm' column in my example).

Then use QueryRecord processor to join records and produce a row having different attributes in it:

select fa.t, fa.v a, fb.v b, fc.v c
 from (
   select t, v from FLOWFILE where m = 'a'
 ) fa
 left join (
   select t, v from FLOWFILE where m = 'b'
 ) fb on fa.t = fb.t
 left join (
   select t, v from FLOWFILE where m = 'c'
 ) fc on fa.t = fc.t

I've created a Gist with NiFi flow template, I hope this helps:

https://gist.github.com/ijokarumawak/7e20af1cd222fb2adf13acb2b0f46aed

avatar
Expert Contributor

Thankyou for your reply. I see that you are using version 1.3.0 which I do not have. I tried to import the template but i get an error saying the UpdateRecord possessor is not known. Is the nar file available?

avatar

@Arsalan Siddiqi I assumed UpdateRecord has been there since 1.2.0, but it's not. Sorry about that. Created another template which doesn't use UpdateRecord. Instead, I used another QueryRecord to update CSV. Confirmed it works with NiFi 1.2.0. Hope you will find it useful.

https://gist.githubusercontent.com/ijokarumawak/7e20af1cd222fb2adf13acb2b0f46aed/raw/e150884f52ca186...