About crodriguesfilho

crodriguesfilho · ‎07-27-2018

Matt, thanks a lot for all your help. I was able to refactor my dataflow, reducing the number of groups and keeping everything simple in a single dynamic flow. Just to elaborate a little bit better, here's what I did. Data coming in CSV format separated by pipes. e.g.: (transaction #, sequence #, table code) 123|456|35| 123|456|36| 123|456|100| First I split the flowfile into multiple ones using SplitText >> then I used the ExtractText processor to grab the 3rd field (table code) >> LookupAttribute setting the user-defined-field schema.name (to be used by AvroSchemaRegistry controller service) >> Push the data to Kafka and Hive using the appropriate processors. Thanks a lot!

crodriguesfilho · ‎07-25-2018

Hi Matt. First of all, thank you so much for the explanation. My scenario currently falls into the 3rd one you described: I have multiple table codes coming in a single flowfile. Could you please elaborate on how to use the PartitionRecord processor? I tried here using the CSVReader and CSVSetWriter controller services, but they ask for an Avro schema as well. All the tables' structures I'm working with right now have only the 3 first fields in common (the last one being the table code). The rest of them varies, so I got a little bit confused on how to set this avro schema.

crodriguesfilho · ‎07-25-2018

Hi experts, Good day! I've been using Nifi for a couple of months, so I'm still learning lots of new things every day. I'm building a dataflow to get csv data (separated by pipes - '|' ) and push it to different targets (e.g. Hive, SQL Server, and Kafka). The project started fine but the dataflow started getting bigger and bigger and now I'm finding it difficult to manage. I just wanted to ask for some help understanding if I'm currently working with the best possible scenario. More details below. I'm getting data from a ListenHTTP processor. Data comes as csv separated by pipes. One of the fields is a code that identifies which table the data should be pushed to, so I've created one process group for each "table". Here's where I think the dataflow gets complicated. Each of those groups (23, to be precise) has 4 other groups, each responsible to push data to a specific target. Since I have a Hive dataflow inside these groups, I had to create the Avro schema defining the structure for each table. I was just wondering if I could substitute this dataflow with a single one that evaluates the code in the csv, and "chooses" the correct avro schema to be used. I did some research but couldn't progress further. If there's a way to do it, I could simply substitute those 23 groups with a single dynamic dataflow. Hopefully you can help me with this scenario. Thanks in advance! Sincerely, Cesar Rodrigues

crodriguesfilho · ‎02-26-2018

Hi, guys, I have a data flow in Nifi that gets a file from the server and converts it to Avro to stream data to Hive. In this flow, I have some sensitive information that I need to hash (SHA2_512). I checked that Nifi has a couple of processors to work with hash, but it seems they only do this for the whole file. Is there a way to hash a specific field? Before converting to Avro, my flow files are coming from the server as fields separated by pipes ('|'). Thanks in advance! Cheers

crodriguesfilho · ‎02-14-2018

Thanks, @Matt Burgess! This helped a lot 😉

crodriguesfilho · ‎02-07-2018

@Abdelkrim Hadjidj Thank you! Could you please provide more details on how to use the schema registry? I'm having some trouble with that.

crodriguesfilho · ‎02-07-2018

@Matt Burgess I've never used the Avro Schema before. Could you please explain how to name the fields in it? I checked the documentation, but it's a little bit confusing. Thanks in advance!

crodriguesfilho · ‎02-07-2018

Hi, guys, So I have an incoming FlowFile with content text delimited by pipes ('|'), and I want to send this information to several destinations. To convert it to JSON, for example, I know I can use the AttributesToJSON processor, but how exactly can I access the FlowFile content and convert them to attributes? e.g. original FlowFile content: 1234567891285|37797|1| the brown fox FlowFile attributes (after converting): id = 1234567891285 sequence = 37797 category = 1 text = the brown fox ... and after that I could use AttributesToJSON to generate my JSON file. Any ideas on how to achieve this? Thanks in advance! Cheers.

crodriguesfilho · ‎02-05-2018

@Shu, thank you very much. It worked perfectly!

crodriguesfilho · ‎02-02-2018

Hello, guys, I'm trying to use Nifi to split a text file into 2 other files. I just have a problem with that: I need to split them based on their category type. e.g. FlowFile content: Some fixed text |1| more text Another field |8| more text Last one |1| more text With that, I'd like to split this file into, for example: first FlowFile: Some fixed text |1| more text Last one |1| more text second FlowFile: Another field |8| more text Do you guys have any idea on how to accomplish that using Nifi? I appreciate any help you can provide. Thanks in advance, Cheers!

Online	Offline
Last Visited	‎12-03-2018 03:37 PM

Member Since	‎02-02-2018 05:41 PM
Last Visited	‎12-03-2018 03:37 PM
Posts	20

Cloudera Community

Re: Nifi dataflow best practices (CSV to many targ...

Re: Nifi dataflow best practices (CSV to many targ...

Nifi dataflow best practices (CSV to many targets)

Hash field in Nifi (SHA2_512)

Re: [Nifi] Converting a delimited FlowFile's conte...

Re: [Nifi] Converting a delimited FlowFile's conte...

Re: [Nifi] Converting a delimited FlowFile's conte...

[Nifi] Converting a delimited FlowFile's content t...

Re: Split FlowFile into multiple files based on ca...

Split FlowFile into multiple files based on catego...