Created 02-26-2018 10:19 PM
I am creating some code to inspect a zip file with multiple sub folders and I want to look through the files in the zip and then pass the original content.
I am trying to do this without writing the file to disk.
Any ideas for how I could do this all in ram? or the nifi content repositories?
I also used a guide to write the code in groovy and I was wondering how I could pass original content, in groovy?
John
Created 02-26-2018 10:32 PM
What kinds of operations are you trying to perform on the files in the ZIP?
Created 02-26-2018 10:46 PM
@Matt Burgess I want to look at an xml doc inside the zip within a directory in the zip. And grab a value of a tag. The thing is I want the original flowfile to pass along with the tag value.
Thanks Matt!
Created 02-27-2018 12:34 AM
You know the path of the XML doc? I'm still looking at memory vs temp disk storage, if you can fill in this blank I hope to have an answer for you (in Groovy probably lol) tomorrow 🙂
Created 03-05-2018 08:35 PM
Hey @Matt Burgess just wondering if you had a chance to point me in the right direction. 🙂 I appreciate the help!
Created 02-27-2018 08:33 PM
Yeah! the path is ./docProps/app.xml Its taking apart a docx/xlsx/pptx document and looking at the tag that tells the version of office it was created with. Any office document is really just a zip file. Hence what I'm trying to do 🙂 Thanks for your help!
Created 03-05-2018 07:29 PM
Doing a trick like this to determine the MS version of a file would also be useful to me in my Nifi flow, so I'm keeping an eye on this.
Created 03-09-2018 06:12 PM
import zipfile from org.apache.nifi.processor.io import InputStreamCallback class ReadVersion(InputStreamCallback) def __init__(self): self.ff = None self.version = '' self.error = '' def process(self,inputStream): try: zipname = self.ff.getAttribute('filename') zippath = self.ff.getAttribute('absolute.path') zfile = zipfile.ZipFile(zippath+zipname) for name in zfile.namelist(): if (name == 'docProps/app.xml'): inFile = zfile.open(name) inContents = infile.read() loc = inContents.find('<AppVersion>1') if (loc != -1): keyChar = inContents[loc+13:loc+14] if (keyChar == '2'): self.version = '2007' elif (keyChar == '4'): self.version = '2010' elif (keyChar == '5'): self.version = '2013' elif (keyChar == '6'): self.version = '2016' else: log.warn('Unexpected AppVersion value: ',inContents[loc+12:loc+14]) except: log.warn('exception thrown (is this really a zip file?)') self.error = 'error' ff = session.get() if (ff != None): callback = ReadVersion() callback.ff = ff session.read(ff, callback) if (callback.version != ''): ff = session.putAttribute(ff,'MSVersion',callback.version) session.transfer(ff, REL_SUCCESS) if (callback.error == 'error'): session.transfer(ff, REL_FAILURE)