- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Unzip files in ExecuteScript NiFi processor
- Labels:
-
Apache NiFi
Created ‎02-26-2018 10:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am creating some code to inspect a zip file with multiple sub folders and I want to look through the files in the zip and then pass the original content.
I am trying to do this without writing the file to disk.
Any ideas for how I could do this all in ram? or the nifi content repositories?
I also used a guide to write the code in groovy and I was wondering how I could pass original content, in groovy?
John
Created ‎02-26-2018 10:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What kinds of operations are you trying to perform on the files in the ZIP?
Created ‎02-26-2018 10:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Matt Burgess I want to look at an xml doc inside the zip within a directory in the zip. And grab a value of a tag. The thing is I want the original flowfile to pass along with the tag value.
Thanks Matt!
Created ‎02-27-2018 12:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You know the path of the XML doc? I'm still looking at memory vs temp disk storage, if you can fill in this blank I hope to have an answer for you (in Groovy probably lol) tomorrow 🙂
Created ‎03-05-2018 08:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey @Matt Burgess just wondering if you had a chance to point me in the right direction. 🙂 I appreciate the help!
Created ‎02-27-2018 08:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yeah! the path is ./docProps/app.xml Its taking apart a docx/xlsx/pptx document and looking at the tag that tells the version of office it was created with. Any office document is really just a zip file. Hence what I'm trying to do 🙂 Thanks for your help!
Created ‎03-05-2018 07:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Doing a trick like this to determine the MS version of a file would also be useful to me in my Nifi flow, so I'm keeping an eye on this.
Created ‎03-09-2018 06:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
import zipfile from org.apache.nifi.processor.io import InputStreamCallback class ReadVersion(InputStreamCallback) def __init__(self): self.ff = None self.version = '' self.error = '' def process(self,inputStream): try: zipname = self.ff.getAttribute('filename') zippath = self.ff.getAttribute('absolute.path') zfile = zipfile.ZipFile(zippath+zipname) for name in zfile.namelist(): if (name == 'docProps/app.xml'): inFile = zfile.open(name) inContents = infile.read() loc = inContents.find('<AppVersion>1') if (loc != -1): keyChar = inContents[loc+13:loc+14] if (keyChar == '2'): self.version = '2007' elif (keyChar == '4'): self.version = '2010' elif (keyChar == '5'): self.version = '2013' elif (keyChar == '6'): self.version = '2016' else: log.warn('Unexpected AppVersion value: ',inContents[loc+12:loc+14]) except: log.warn('exception thrown (is this really a zip file?)') self.error = 'error' ff = session.get() if (ff != None): callback = ReadVersion() callback.ff = ff session.read(ff, callback) if (callback.version != ''): ff = session.putAttribute(ff,'MSVersion',callback.version) session.transfer(ff, REL_SUCCESS) if (callback.error == 'error'): session.transfer(ff, REL_FAILURE)
