Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NIFI: SelectHiveQL vs QueryDatabaseTable

avatar
Master Guru

I want to query Hive for only changed records ala QueryDatabaseTable.

Should I use QueryDatabaseTable or is there a way to make SelectHiveQL maintain current state?

1 ACCEPTED SOLUTION

avatar
Master Guru

I would go with querydatabasetable. this will provide you state and also another important feature to break up return records into flow files. for example if 1000 records are expected to be a output of query, you can set Max Rows Per Flow File to x, and process data in smaller chunks. if you use selecthiveql, then build your query using update attribute and use a state via distributed map cache (DMC). maintain the last state in DMC and use that state in your updateattribute to run query in selecthiveql.

Flow

DMC fetch (state field) --> update attribute (build query) --> selecthiveql

You will have to set dmc to initial state value or set in logic via update attribute.

View solution in original post

3 REPLIES 3

avatar
Master Guru

I would go with querydatabasetable. this will provide you state and also another important feature to break up return records into flow files. for example if 1000 records are expected to be a output of query, you can set Max Rows Per Flow File to x, and process data in smaller chunks. if you use selecthiveql, then build your query using update attribute and use a state via distributed map cache (DMC). maintain the last state in DMC and use that state in your updateattribute to run query in selecthiveql.

Flow

DMC fetch (state field) --> update attribute (build query) --> selecthiveql

You will have to set dmc to initial state value or set in logic via update attribute.

avatar
Explorer

Hi, 

Could you let me know if you have managed to query HIVE table using QueryDatabaseTableRecord processor as I am having issue in doing same please.

 

avatar
Community Manager

Hi @Kumar78, as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: