Support Questions

Find answers, ask questions, and share your expertise

How to use SQL within a QueryRecord Processor to query an Avro Schema

avatar
Explorer

I am trying to reference an Avro Schema that already works from a CEF file and take out a part of that file (time as rt) by referencing the Schema, but I am a bit rusty with my SQL so it's not going to plan rn - does anyone have any examples of doing this in NiFi?

1 ACCEPTED SOLUTION

avatar
Super Guru

@robnew666   yes you can do that, you just need to experiment with SQL statements that satisfy your requirements.  You should work on some tests and inspect what you get in the flowfile after a select *.  This will depend on your reader, etc.   You can also have multiples QueryRecord Processors but dont forget it is possible to have multiple queries in a single processor.  The key you use to hold the query will become a route out of the processor downstream.  For example you could select all results that match a certain condition, and separate those from the rest.

View solution in original post

4 REPLIES 4

avatar
Super Guru

@robnew666   You should provide more information in order for community members to respond with specific details.  Without seeing schema, sample data, what you have, etc, we have little to go off.

 

That said below is an example of query record with a query.  The important theory here is to use + to add new queries and use SELECT * FROM FLOWFILE.  You can add WHERE [your query logic here] to operate agains schema columns just like SQL.  You can have more than 1 query and route them separately down stream.  So for testing start with your first query1 (result) like below.  Next begin to add WHERE and operate against your requirements.  As you further advance your query, make new query2, query3, working methodically towards what works.  When done, delete the test queries you do not need and then route the final query downstream.

 

Screen Shot 2020-10-28 at 8.48.45 AM.png

 

If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.

 

Thanks,

Steven

avatar
Explorer

Thanks,

 

I cant put any data up here, but I think we are getting close.  I can pull all data through which then marries it up with the schema, so it is formatted as data associated with the schema. I am then wondering how I can then add another QueryRecord Processor below which I can then add individual sql lines which can pull out coloumns from this data, ie SELECT *, FROM FLOWFILE WHERE table_name = 'rt'

 

avatar
Super Guru

@robnew666   yes you can do that, you just need to experiment with SQL statements that satisfy your requirements.  You should work on some tests and inspect what you get in the flowfile after a select *.  This will depend on your reader, etc.   You can also have multiples QueryRecord Processors but dont forget it is possible to have multiple queries in a single processor.  The key you use to hold the query will become a route out of the processor downstream.  For example you could select all results that match a certain condition, and separate those from the rest.

avatar
Explorer

Thanks,

I am trying some stuff now to parse data using the JoltSpec/JoltTransformJSON processor that could help me with this issue, but thanks for this help, hopefully can get things running more smoothly soon. 🙂