Support Questions

Find answers, ask questions, and share your expertise

How to extract csv column record and used it for file name and sheet name ?

avatar
Rising Star

Hi All,

I have a scenario where I will get a number of records from csv file, the task is to read the csv and split each record as one file and save the each file and sheet name with the record name of the 1st column(exclude 1st row as header).

 

1. What I have done is  read the csv file using Getfile, 

2. then used splittext processor, to split each record as one csv file by setting property Header line count to 1.

3. then need to extract the 1st record at column1, use that record(2nd row and 2nd column) value as file name and sheet name for the each individual file

 

Original csv file:

murali2425_0-1610040181706.png

after split the there should two files, one with the name ab123.csv and c35ks.csv and also sheet name also should be changed.

ID Description status
ab123 Eldon Base for stackable storage shelf, platinum  

ab123.csv

 

ID Description status
c35ks 1.7 Cubic Foot Compact "Cube" Office Refrigerators  

c35ks.csv

 

How to get above out puts after the work flow.

2 ACCEPTED SOLUTIONS

avatar
Expert Contributor

Hi, 

i tried a solution for you:

1) GenerateFlowFile

Its your GetFile Processor to get the csv file

 

2) ConvertRecord

Convert with CSVReader to JsonRecordSetWriter

 

3) SplitJson

Split each bbject (csv row) with 

 

$.*

 

as path

 

4) EvaluateJsonPath

Add dynamicilly property with name

 

filename

 

and value

 

$.ID

 

to get the ID as filename on flowfile attribute

 

5) UpdateAttribute

Add type of file to filename attribute value

 

${filename:append('.csv')}

 

 

6) ConvertRecord

Now is the question how to work..

You can convert json back to csv or you are working with wait/notify, so that you can overhand your "filename" attribute to your splitted csv flowfile..

How to extract csv column record and used it for file name and sheet name.png

How to extract csv column record and used it for file name and sheet name.png

View solution in original post

avatar
Super Guru

@murali2425   The solution you are looking for is QueryRecord configured with a CSV Record Reader and Record Writer.   You also have UpdateRecord and ConvertRecord which can use the Readers/Writers.  This method is preferred over splitting the file and adds some nice functionality.  This method allows you to provide a schema for both the inbound csv (reader) and the downstream csv (writer).   Using QueryRecord you should be able to split the file, and set attribute of filename set to column1.  At the end of the flow you should be able to leverage that filename attribute to resave the new file.

 

You can find some specific examples and configuration screen shots here:

 

https://community.cloudera.com/t5/Community-Articles/Running-SQL-on-FlowFiles-using-QueryRecord-Proc...

 

If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.

 

Thanks,

Steven

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

Hi, 

i tried a solution for you:

1) GenerateFlowFile

Its your GetFile Processor to get the csv file

 

2) ConvertRecord

Convert with CSVReader to JsonRecordSetWriter

 

3) SplitJson

Split each bbject (csv row) with 

 

$.*

 

as path

 

4) EvaluateJsonPath

Add dynamicilly property with name

 

filename

 

and value

 

$.ID

 

to get the ID as filename on flowfile attribute

 

5) UpdateAttribute

Add type of file to filename attribute value

 

${filename:append('.csv')}

 

 

6) ConvertRecord

Now is the question how to work..

You can convert json back to csv or you are working with wait/notify, so that you can overhand your "filename" attribute to your splitted csv flowfile..

How to extract csv column record and used it for file name and sheet name.png

How to extract csv column record and used it for file name and sheet name.png

avatar
Super Guru

@murali2425   The solution you are looking for is QueryRecord configured with a CSV Record Reader and Record Writer.   You also have UpdateRecord and ConvertRecord which can use the Readers/Writers.  This method is preferred over splitting the file and adds some nice functionality.  This method allows you to provide a schema for both the inbound csv (reader) and the downstream csv (writer).   Using QueryRecord you should be able to split the file, and set attribute of filename set to column1.  At the end of the flow you should be able to leverage that filename attribute to resave the new file.

 

You can find some specific examples and configuration screen shots here:

 

https://community.cloudera.com/t5/Community-Articles/Running-SQL-on-FlowFiles-using-QueryRecord-Proc...

 

If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.

 

Thanks,

Steven