Member since
10-08-2016
59
Posts
16
Kudos Received
0
Solutions
11-09-2022
05:05 AM
can someone explain me what is the meaning of dot-rename ? i'm using a putsftp processor for one of my flows and i'm getting an error about dot-rename
... View more
04-13-2020
09:18 AM
@Aminsh I am not sure where your response fits in to this thread. Are you asking a new question here? I recommend you start a new thread if that is the case. Thanks, Matt
... View more
06-13-2018
03:18 PM
https://community.hortonworks.com/content/kbentry/109629/how-to-achieve-better-load-balancing-using-nifis-s.html
... View more
10-29-2018
06:36 PM
@Bobby Harsono - Some processor may be designed to utilize memory outside of the JVM. Some of the scripting processor like ExecuteProcess or ExecuteStreamCommand are a good examples. They are calling a process or script external to NiFi. Those externally executed commands will have a memory footprint of their own. - Listen type processors like ListenTCP or ListenUDP is another example. These have memory footprints both inside and outside the NiFi JVM heap space. These processors can be configured with socket buffer which is created outside of heap space.- - Thanks, Matt
... View more
05-07-2018
06:14 PM
@John T I have recently built out an HDF environment for a Fortune 1 retail company to handle 1-2k connections per node and move an average of 1-1.5TB a day. We utilized the HandleHTTP processors as MiNiFi was not an option at project conception. If you are using the HandleHTTPRequest/Response processors, note that there is a bug which causes objects to not be released correctly causing heap utilization to climb in a linear fashion. Our workaround was to utilize the API to stop/start the HandleHTTPRequest processor when the heap reached 70%. This bug was corrected in the 1.6 release of NiFi but has not been rolled up into an HDF release since I last checked. So, handling that kind of volume will cause the same scenario in your situation. If you can use ListenHTTP (or MiNiFi as Matt suggested), you should be fine. We were utilizing external load balancers as we were running three clusters in separate data centers. The plan in the next phase is to start utilizing MiNiFi in the edge environments and point the different systems feeding data into HDF at those MiNiFi HTTP listeners. If you are running a single cluster, as Matt mentioned, that would load balance for you.
... View more
03-09-2018
06:12 PM
@Matt Burgess, @John T: I got this working in Python, my first ever such program ,so it might be rough around the edges. The nifi processor expects input from listFile, not getFile, as it uses zipfile, which wants a file to read. The code: import zipfile
from org.apache.nifi.processor.io import InputStreamCallback
class ReadVersion(InputStreamCallback)
def __init__(self):
self.ff = None
self.version = ''
self.error = ''
def process(self,inputStream):
try:
zipname = self.ff.getAttribute('filename')
zippath = self.ff.getAttribute('absolute.path')
zfile = zipfile.ZipFile(zippath+zipname)
for name in zfile.namelist():
if (name == 'docProps/app.xml'):
inFile = zfile.open(name)
inContents = infile.read()
loc = inContents.find('<AppVersion>1')
if (loc != -1):
keyChar = inContents[loc+13:loc+14]
if (keyChar == '2'):
self.version = '2007'
elif (keyChar == '4'):
self.version = '2010'
elif (keyChar == '5'):
self.version = '2013'
elif (keyChar == '6'):
self.version = '2016'
else:
log.warn('Unexpected AppVersion value: ',inContents[loc+12:loc+14])
except:
log.warn('exception thrown (is this really a zip file?)')
self.error = 'error'
ff = session.get()
if (ff != None):
callback = ReadVersion()
callback.ff = ff
session.read(ff, callback)
if (callback.version != ''):
ff = session.putAttribute(ff,'MSVersion',callback.version)
session.transfer(ff, REL_SUCCESS)
if (callback.error == 'error'):
session.transfer(ff, REL_FAILURE)
... View more
04-21-2017
05:19 PM
@John T Sounded a lot like a back pressure scenario to me when you first described what was going on. Glad you were able to resolve you issue. I also saw your other post and commented on it.
... View more
04-20-2017
03:43 PM
1 Kudo
@John T If you are using the listSFTP processor before your FetchSFTP processor , it will produce a zero byte flow flowfile for every FlowFile it finds on the target SFTP server. The listSFTP processor has a "File Filter Regex" where you can specify a java regular expression to limit what is returned to just files containing "file123.txt". For example "*file123.txt" The ListSFTP processor also maintains state so that the same files are not listed each time. so only new files containing file123.txt are listed each time it runs. The FetchSFTp processor is designed to return the content of a specific file and insert it as content to the FlowFile that he FetchSFTP processor is running against. Thanks, Matt
... View more