Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

QueryDatabaseTable processor shutting down

Highlighted

QueryDatabaseTable processor shutting down

Master Collaborator

I have added more rows to the source table but even though the QueryDatabaseTable is showing green arrow ,means its running, but its not picking up the newly added rows . The old rows in the table it successfully moved to HIVE.

I am attaching the schedule qdb-schedule.jpgflow-diag.jpg

14 REPLIES 14

Re: QueryDatabaseTable processor shutting down

Master Collaborator

ah wait a second .. it loaded the hive table with the new rows but the nifi console is showing everything as zero ? why ?

64802-capture.jpg

Re: QueryDatabaseTable processor shutting down

Super Guru

@Sami Ahmad

All the changes that got made to your table is pulled before 5 mins so the console showing all zero's, as all the processor level usage/stats on NiFi Processors will show sliding 5 mins window.
As you have scheduled to run QueryDatabaseTable processor to 10 sec as the processor runs for every 10 sec and checks is there any new records got added after the stored state.

Now the processor got pulled all the newly added records and there is no records got added after the last state that got stored in the processor.

Even though processor runs for every 10 sec at the time in NiFi console querydatabase table processor updates the Read/Write and Tasks/Time usage/stats if there are any new changes made to the table then processor will have out stat will be updated.

When you see out stat having some size then only you can see ConvertAvroTo ORC,PutHDFS processor usage/stats will be updated.

Example:-

66384-querydatabasetable.png

As you can see in the above screenshot i'm running QDB every 10 sec and there are stats on Read/write,Tasks/Time but nothing got in Out stats that means processor running every 10 sec but still could find any updated records, that's the reason why nifi console showing everything is zero for ConvertAvroToOrc and PutHDFS processors.

Clear the state in QDB and run again:-

66385-clearstate-qdb.png

i have cleared the state in QDB processor and ran again now we can see out having 468 bytes of filesize i.e size of flowfile that got sent from QDB processor and convertAvroToOrc having updated stats.

So when QDB processor sends an output flowfile then only all the processor stats will be updated and they will last for 5 minutes.

when there are no changes made on the table but still QDB processor checks for new records if it found any new records updates the state and sends an output flowfile, If not then QDB processor won't send any flowfile and won't update the state.

In addition as you are running QDB processor on all nodes but this processor is intended to run on only primary node if the processor running on all nodes it will duplicate the data. Please change the running to primary node.

66386-qdb.png

Re: QueryDatabaseTable processor shutting down

Master Collaborator

hi Shu

is it possible to find how long it took for the last step in Nifi? I want to see how long it took to load the newly added bunch of rows to a table.

Also I loaded 55 million rows to the table and I know nifi is trying to load it since I see a "1" in the processor window but other than that every other counter is zero and not showing any data transfer ..why ?

64806-capture.jpg

Re: QueryDatabaseTable processor shutting down

Master Collaborator

64808-capture.jpg

I tried loading 55 million rows but the QueryDatabaseTable processor failing with the following error , what parameters I need to tweek to make larger loads?

Re: QueryDatabaseTable processor shutting down

Super Guru
@Sami Ahmad

Please change the below properties to true

Normalize Table/Column Names
true

Look into below properties

Fetch Size0The number of result rows to be fetched from the result set at a time. This is a hint to the driver and may not be honored and/or exact. If the value specified is zero, then the hint is ignored.
Supports Expression Language: true
Max Rows Per Flow File0The maximum number of result rows that will be included in a single FlowFile. This will allow you to break up very large result sets into multiple FlowFiles. If the value specified is zero, then all rows are returned in a single FlowFile.
Supports Expression Language: true
Maximum Number of Fragments0The maximum number of fragments. If the value specified is zero, then all fragments are returned. This prevents OutOfMemoryError when this processor ingests huge table.
Supports Expression Language: true

Change the properties accordingly to your requirements and if these properties are set to default values then you can get all 55 million records in one flowfile until QDB processor pulls all the records you can see the thread running on Right corner of the QDB processor.

Re: QueryDatabaseTable processor shutting down

Super Guru
@Sami Ahmad

If you want to do this pull more parallelly then Generatetablefetch +RemoteProcessorGroup + ExecuteSql processor will do that

for more details about Generate Table fetch processor

https://community.hortonworks.com/questions/61794/how-to-fetch-rows-from-a-table-in-parallel-when-ni...

Re: QueryDatabaseTable processor shutting down

Master Collaborator

64812-capture.jpg

qdb-properties.jpg

I have changed the properties as shown , restarted NIfi and restarted the flow but I see only 4 new ORC files got produced , on the flow window I keep seeing the thread go from 0-4 and tasks go from 0 -44 .

the hive table only showing 5.5millon rows .

is there a log file where nifi logs how many rows it has processed and what is it currently doing?

these information are not in /var/log/nifi/nifi-user.log or /var/log/nifi/nifi-app.log

Re: QueryDatabaseTable processor shutting down

I recommend against using Maximum Number of Fragments, since the result set is not ordered, you can miss rows on the next execution if the largest current value for the max-value column has been encountered in the first set of "fragments". Instead, in NiFi 1.6.0 / HDF 3.2 (via NIFI-4836), you will be able to send out flow files while the result set is still being processed (via the Output Batch Size parameter). Note that when you use that parameter, the flow files won't have the total result set count information (because some flow files have already been transferred before the total count was ascertained), and it's not transactional (meaning an error halfway through the result set processing may end up with duplicate data being transmitted downstream on the next run).

Re: QueryDatabaseTable processor shutting down

Master Collaborator

like now the threads are not even showing and tasks is showing as 44 . after a while it will pop up with threads 4 .

but no more orc files are being produced.

these are the files that were produced and it amounts to 5.5 million rows . my source table has 55 million rows

[hdfs@hadoop1 ~]$ hdfs dfs -ls /user/sami
Found 8 items
-rw-r--r--   3 nifi hdfs      53015 2018-03-23 10:24 /user/sami/73040114026486.orc
-rw-r--r--   3 nifi hdfs    4836592 2018-03-23 11:37 /user/sami/77434128924536.orc
-rw-r--r--   3 nifi hdfs    5059986 2018-03-23 12:02 /user/sami/78946299398134.orc
-rw-r--r--   3 nifi hdfs  188138117 2018-03-23 12:13 /user/sami/79332578073548.orc
-rw-r--r--   3 nifi hdfs   12860834 2018-03-23 16:40 /user/sami/89388425185109.orc
-rw-r--r--   3 nifi hdfs   12860834 2018-03-23 16:40 /user/sami/89388494417871.orc
-rw-r--r--   3 nifi hdfs   12860834 2018-03-23 16:40 /user/sami/89388714549104.orc
-rw-r--r--   3 nifi hdfs   12860834 2018-03-23 16:40 /user/sami/89388732452940.orc
[hdfs@hadoop1 ~]$