Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Update CSV attribute/Merge CSV files

avatar
Expert Contributor

Hi

I have multiple csv files where each file contains an attribute value at a given time. There are a total of 60 files (60 different attributes). These are basically Spark's Metric Dump. example:

The file name is the name of the application followed by the attribute name.

For the example below the application name is :local-1495979652246 and attribute for the first file is: BlockManager.disk.diskSpaceUsed_MB

local-1495979652246.driver.BlockManager.disk.diskSpaceUsed_MB.csv

local-1495979652246.driver.BlockManager.memory.maxMem_MB.csv

local-1495979652246.driver.BlockManager.memory.memUsed_MB.csv

each file contains values like:

tvalue
14965881670.003329809088456
14965881680.00428465362778284

The file name specifys the name of the attribute.

The first thing i need to do is to update csv header field called value to the attribute name from the filename

t

BlockManager.disk.diskSpaceUsed_MB

14965881670.003329809088456

the next thing would be to merge all files for the same application based on the value of the filed t. and eventually I should have one csv file for each application containing the values for all the attributes like:

tBlockManager.disk.diskSpaceUsed_MBBlockManager.memory.maxMem_MBBlockManager.memory.memUsed_MBmore attributes...
14965881670.003329809088456some valuesome valuesome value
14965881680.00428465362778284some valuecome value..

any suggestions?

1 ACCEPTED SOLUTION

avatar

Hello @Arsalan Siddiqi

I think it's possible with NiFi 1.2.0 or later, by using QueryRecord processor.

The basic idea is using MergeContent to create a single FlowFile, containing all CSV files, when doing so, add a column specifying which attribute (I used 'm' column in my example).

Then use QueryRecord processor to join records and produce a row having different attributes in it:

select fa.t, fa.v a, fb.v b, fc.v c
 from (
   select t, v from FLOWFILE where m = 'a'
 ) fa
 left join (
   select t, v from FLOWFILE where m = 'b'
 ) fb on fa.t = fb.t
 left join (
   select t, v from FLOWFILE where m = 'c'
 ) fc on fa.t = fc.t

I've created a Gist with NiFi flow template, I hope this helps:

https://gist.github.com/ijokarumawak/7e20af1cd222fb2adf13acb2b0f46aed

View solution in original post

3 REPLIES 3

avatar

Hello @Arsalan Siddiqi

I think it's possible with NiFi 1.2.0 or later, by using QueryRecord processor.

The basic idea is using MergeContent to create a single FlowFile, containing all CSV files, when doing so, add a column specifying which attribute (I used 'm' column in my example).

Then use QueryRecord processor to join records and produce a row having different attributes in it:

select fa.t, fa.v a, fb.v b, fc.v c
 from (
   select t, v from FLOWFILE where m = 'a'
 ) fa
 left join (
   select t, v from FLOWFILE where m = 'b'
 ) fb on fa.t = fb.t
 left join (
   select t, v from FLOWFILE where m = 'c'
 ) fc on fa.t = fc.t

I've created a Gist with NiFi flow template, I hope this helps:

https://gist.github.com/ijokarumawak/7e20af1cd222fb2adf13acb2b0f46aed

avatar
Expert Contributor

Thankyou for your reply. I see that you are using version 1.3.0 which I do not have. I tried to import the template but i get an error saying the UpdateRecord possessor is not known. Is the nar file available?

avatar

@Arsalan Siddiqi I assumed UpdateRecord has been there since 1.2.0, but it's not. Sorry about that. Created another template which doesn't use UpdateRecord. Instead, I used another QueryRecord to update CSV. Confirmed it works with NiFi 1.2.0. Hope you will find it useful.

https://gist.githubusercontent.com/ijokarumawak/7e20af1cd222fb2adf13acb2b0f46aed/raw/e150884f52ca186...