I am creating some code to inspect a zip file with multiple sub folders and I want to look through the files in the zip and then pass the original content.
I am trying to do this without writing the file to disk.
Any ideas for how I could do this all in ram? or the nifi content repositories?
I also used a guide to write the code in groovy and I was wondering how I could pass original content, in groovy?
@Matt Burgess I want to look at an xml doc inside the zip within a directory in the zip. And grab a value of a tag. The thing is I want the original flowfile to pass along with the tag value.
You know the path of the XML doc? I'm still looking at memory vs temp disk storage, if you can fill in this blank I hope to have an answer for you (in Groovy probably lol) tomorrow 🙂
Yeah! the path is ./docProps/app.xml Its taking apart a docx/xlsx/pptx document and looking at the tag that tells the version of office it was created with. Any office document is really just a zip file. Hence what I'm trying to do 🙂 Thanks for your help!
import zipfile from org.apache.nifi.processor.io import InputStreamCallback class ReadVersion(InputStreamCallback) def __init__(self): self.ff = None self.version = '' self.error = '' def process(self,inputStream): try: zipname = self.ff.getAttribute('filename') zippath = self.ff.getAttribute('absolute.path') zfile = zipfile.ZipFile(zippath+zipname) for name in zfile.namelist(): if (name == 'docProps/app.xml'): inFile = zfile.open(name) inContents = infile.read() loc = inContents.find('<AppVersion>1') if (loc != -1): keyChar = inContents[loc+13:loc+14] if (keyChar == '2'): self.version = '2007' elif (keyChar == '4'): self.version = '2010' elif (keyChar == '5'): self.version = '2013' elif (keyChar == '6'): self.version = '2016' else: log.warn('Unexpected AppVersion value: ',inContents[loc+12:loc+14]) except: log.warn('exception thrown (is this really a zip file?)') self.error = 'error' ff = session.get() if (ff != None): callback = ReadVersion() callback.ff = ff session.read(ff, callback) if (callback.version != ''): ff = session.putAttribute(ff,'MSVersion',callback.version) session.transfer(ff, REL_SUCCESS) if (callback.error == 'error'): session.transfer(ff, REL_FAILURE)