Created 06-04-2017 04:50 PM
Hi
I have multiple csv files where each file contains an attribute value at a given time. There are a total of 60 files (60 different attributes). These are basically Spark's Metric Dump. example:
The file name is the name of the application followed by the attribute name.
For the example below the application name is :local-1495979652246 and attribute for the first file is: BlockManager.disk.diskSpaceUsed_MB
local-1495979652246.driver.BlockManager.disk.diskSpaceUsed_MB.csv
local-1495979652246.driver.BlockManager.memory.maxMem_MB.csv
local-1495979652246.driver.BlockManager.memory.memUsed_MB.csv
each file contains values like:
t | value |
1496588167 | 0.003329809088456 |
1496588168 | 0.00428465362778284 |
The file name specifys the name of the attribute.
The first thing i need to do is to update csv header field called value to the attribute name from the filename
t | BlockManager.disk.diskSpaceUsed_MB |
1496588167 | 0.003329809088456 |
the next thing would be to merge all files for the same application based on the value of the filed t. and eventually I should have one csv file for each application containing the values for all the attributes like:
t | BlockManager.disk.diskSpaceUsed_MB | BlockManager.memory.maxMem_MB | BlockManager.memory.memUsed_MB | more attributes... |
1496588167 | 0.003329809088456 | some value | some value | some value |
1496588168 | 0.00428465362778284 | some value | come value | .. |
any suggestions?
Created 06-05-2017 06:11 AM
Hello @Arsalan Siddiqi
I think it's possible with NiFi 1.2.0 or later, by using QueryRecord processor.
The basic idea is using MergeContent to create a single FlowFile, containing all CSV files, when doing so, add a column specifying which attribute (I used 'm' column in my example).
Then use QueryRecord processor to join records and produce a row having different attributes in it:
select fa.t, fa.v a, fb.v b, fc.v c from ( select t, v from FLOWFILE where m = 'a' ) fa left join ( select t, v from FLOWFILE where m = 'b' ) fb on fa.t = fb.t left join ( select t, v from FLOWFILE where m = 'c' ) fc on fa.t = fc.t
I've created a Gist with NiFi flow template, I hope this helps:
https://gist.github.com/ijokarumawak/7e20af1cd222fb2adf13acb2b0f46aed
Created 06-05-2017 06:11 AM
Hello @Arsalan Siddiqi
I think it's possible with NiFi 1.2.0 or later, by using QueryRecord processor.
The basic idea is using MergeContent to create a single FlowFile, containing all CSV files, when doing so, add a column specifying which attribute (I used 'm' column in my example).
Then use QueryRecord processor to join records and produce a row having different attributes in it:
select fa.t, fa.v a, fb.v b, fc.v c from ( select t, v from FLOWFILE where m = 'a' ) fa left join ( select t, v from FLOWFILE where m = 'b' ) fb on fa.t = fb.t left join ( select t, v from FLOWFILE where m = 'c' ) fc on fa.t = fc.t
I've created a Gist with NiFi flow template, I hope this helps:
https://gist.github.com/ijokarumawak/7e20af1cd222fb2adf13acb2b0f46aed
Created 06-05-2017 09:16 AM
Thankyou for your reply. I see that you are using version 1.3.0 which I do not have. I tried to import the template but i get an error saying the UpdateRecord possessor is not known. Is the nar file available?
Created 06-05-2017 11:54 PM
@Arsalan Siddiqi I assumed UpdateRecord has been there since 1.2.0, but it's not. Sorry about that. Created another template which doesn't use UpdateRecord. Instead, I used another QueryRecord to update CSV. Confirmed it works with NiFi 1.2.0. Hope you will find it useful.