About gillu_rock_in

VidyaSargur · ‎02-13-2022

@yamaga, as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.

MattWho · ‎01-10-2019

@Gillu Varghese I would inspect your content repository to see if the referenced claim (StandardResourceClaim[id=XXX, container=default, section=490], offset=0, length=190]) still exists. Within content_repository directory, I would look for sub-folder "490". Then within that folder look for file named XXX (assuming you replaced actual claim number here with XXX) - It sounds like this file my have been deleted. Do you have some external process that may be access your content repository? Was the content repository maybe moved? Do you maybe have multiple node NiFi cluster where every node is trying to share same mounted content repo? Was NIFi restarted as different user? This could result in some files in repo being owned by different users which may lead to permissions issues access those files. - FlowFiles are what move from processor to processor. The FlowFile metadata (stored in flowfile repository) includes information on the size and location of the physical content within one of the content repositories (default in this case). In this case, the FlowFile has reached a processor where actually retrieving that content was needed, but could not be found. - Thank you, Matt

MattWho · ‎01-16-2019

@Gillu Varghese Have you considered upgrading to NiFi 1.8 to take advantage of the load distribution capability of connections? I am assuming you script is executing on each node in your cluster? So the script is essentially looking for 50 flowfiles on each node which would explain why it just sits there. I am not a groovy script writer, so i am of little help there. --- The only other option that comes to mind is incrementing a value in a distributedMapCache server per node. Then have a side flow that constantly checks the sum of those cache values until it equals 50. That flow then notifies all 50 files were written and resets the per node cache values back to zero. Processors Flow 1: --> PutSFTP ---> FetchDistirbutedMapCache (get current stored value for node) --> ReplaceText (Replace content with retrieved value +1) ---> PutDistributedMapCache (write new value to cache) Flow 2: GenerateFlowFile (primary node only) ---> FetchDistributedMapCache (x3 to retreive stored cache value for each node) --> RouteOnAttribute (add relationship for when sum of all cache values equals 50, terminate unmatched) --> PutEmail (notification) --- Thanks, Matt

MattWho · ‎12-03-2018

@Gillu Varghese A few questions: 1. Are you sure all 136 files are reaching the MergeContent processor's inbound connection within 5 minutes? The bin age starts when very first FlowFile is added to a bin. At 5 minutes from that point the bin will be merged even if not all 136 have arrived. 2. Is your NiFi a cluster or standalone instance of NiFi? If cluster, are all 136 FlowFiles on same NiFi node? Each node in a cluster can only merge FlowFiles residing on same Node. There is a new load balanced connection feature in NiFi 1.8 that can help here if this is the case. https://blogs.apache.org/nifi/entry/load-balancing-across-the-cluster - Try setting your max bin age to a much higher value and see what results you see. - Thank you, Matt

gillu_rock_in · ‎10-26-2018

@Matt BurgessThere is only single task and its not importing the CPython libraries.This job was working fine for 6 months now and all of a sudden it started failing with the error above in the executescript processor.

Shu_ashu · ‎08-14-2018

@Gillu Varghese Both cron triggers in the screenshot are same you can use either of them for scheduling purpose. We cannot trigger just at 3AM, the largest time that we can trigger is at 2:59:59AM with one cron expression.

Shu_ashu · ‎08-08-2018

@Gillu Varghese if you are using Method1: Then ListFIle processor output's flowfile with filename attribute associated with it. In Method2: We are listing out the files in the directory then preparing filename attribute to the flowfile. Then Fetch File Processor Fetches the files in the directory. In both methods works if the filename keep changing also, as we are dynamically adding filename attribute to the flowfile.

gillu_rock_in · ‎04-27-2018

@Matt Clarke Thankyou so much for your quick reply

gillu_rock_in · ‎04-27-2018

@Matt Clarke Thanks Matt for the information..and helping out...it worked

MattWho · ‎04-27-2018

@Gillu Varghese Keep in mind how JVM heap space works. At a very high level, objects in heap are not cleared out when no longer used. So a FlowFile's attributes while queued will exist in heap, when that FlowFile's no longer exists in flow (reached end of flow for example) that heap space is likely to still be occupied. It is the job of Java Garbage Collection (GC) to free unused heap space. So once heap utilization is high enough that free space is needed by the JVM, GC will run to create that free space. - So even after running a heavy flow and no FlowFiles are left anywhere in your dataflows, you may still observe high reported heap usage. That is normal and expected. - Thanks, Matt - If you found this answer addressed your original question, please take a moment to login and click "accept".

Online	Offline
Last Visited	‎07-02-2019 01:07 AM

Member Since	‎04-13-2018 12:38 PM
Last Visited	‎07-02-2019 01:07 AM
Posts	44

Cloudera Community

Re: NiFi - Script count fileFlows Groovy

Re: Nifi processing files as zero bytes

Re: How to count the flowfiles from incoming queue...

Re: mergecontent processor in nifi

Re: Executescript error

Re: setting cron driven jobs in nifi

Re: processor to output the files from execute str...

Re: Nifi Cluster-Connection Suspended

Re: flowfile attributes in Nifi

Re: Dataflow to check JVM usage