Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Need to get Table schema from database using nifi processor

avatar

The problem is need to fetch the table schema for a table. Then after fetching the schema, I need to build the select query from the schema.This is because, it should work any table with any number of columns.

The idea is to build a generic processor to extract records from a table, irrespective of table schema. Instead of desiging different workflows for different table.

10 REPLIES 10

avatar

@pranayreddy bommineni

Maybe InferAvroSchema will be useful for you. You can query the table to get only one row then use InferAvroSchema to get the schema.

Is this something useful ?

avatar

Thanks you for the reply..

But Processors related to Databases will give output in AvroFormat write. I have gone through the InferAvroSchema process, for it input should be in json or csv file ..Then I am thinking this is possible or not.

avatar

What are your constraints?

Can you describe in more details what you want to do with data? if it's only getting data a select * will bring data whatever the schema is. I am trying to understand your use case to see what's the best way to do it and what are the limits.

Also, depending on the database, you may query directly to get a schema like "show create table" with MySQL but this need further processing to make the schema useful.

Finally ExtractAvroMetadata may help since SQL processor get data in Avro. But again, this requires querying the table a first time.

avatar

At the outset, thank you so much for quick reply.

We want to build a NiFi job where we will pass table name and it should list all the columns of that table. Further, we will filter few columns and will store required column data to flat file.

This process should work for any number of tables irrespective of schema.

We are looking to build such generic NiFi Processor.

avatar

@pranayreddy bommineni

It's another story if you want to have this in one generic processor that you want to build. To do this you need to implement your logic in your Java code and write your own processor https://community.hortonworks.com/articles/4318/build-custom-nifi-processor.html

avatar

okay thank you..

one thing is there any way to execute ExecuteSQL Process only once..because it returning duplicate rows

avatar
Master Guru

You can start and immediately stop the processor, it will be guaranteed to run at least once (set the Run Schedule to something like 10 seconds so it gives you enough time to start and stop it).

avatar

thanks matt..

Fine that works, But the thing is that what if my table has 100 millions rows , whether the ExecuteSQL process pulls the entire rows in single execution or not .

if it doesn't pull the all rows in one execution, then I can't use the above suggestion write. is there way to tackle this?

avatar
Master Guru

ExecuteSQL will fetch and send all rows in the ResultSet each time it runs. If you don't want that many rows in a single flow file (but still want it to only execute once), use QueryDatabaseTable with no Max Value Columns set. This acts like ExecuteSQL if no Max-Value Columns are supplied (so you will still want to start/stop it), but also has options like Max Rows Per Flow File, etc.