Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Unzip files in ExecuteScript NiFi processor

avatar
Rising Star

I am creating some code to inspect a zip file with multiple sub folders and I want to look through the files in the zip and then pass the original content.

I am trying to do this without writing the file to disk.

Any ideas for how I could do this all in ram? or the nifi content repositories?

I also used a guide to write the code in groovy and I was wondering how I could pass original content, in groovy?

John

7 REPLIES 7

avatar
Master Guru

What kinds of operations are you trying to perform on the files in the ZIP?

avatar
Rising Star

@Matt Burgess I want to look at an xml doc inside the zip within a directory in the zip. And grab a value of a tag. The thing is I want the original flowfile to pass along with the tag value.

Thanks Matt!

avatar
Master Guru

You know the path of the XML doc? I'm still looking at memory vs temp disk storage, if you can fill in this blank I hope to have an answer for you (in Groovy probably lol) tomorrow 🙂

avatar
Rising Star

Hey @Matt Burgess just wondering if you had a chance to point me in the right direction. 🙂 I appreciate the help!

avatar
Rising Star

Yeah! the path is ./docProps/app.xml Its taking apart a docx/xlsx/pptx document and looking at the tag that tells the version of office it was created with. Any office document is really just a zip file. Hence what I'm trying to do 🙂 Thanks for your help!

avatar
New Contributor

@Matt Burgess

Doing a trick like this to determine the MS version of a file would also be useful to me in my Nifi flow, so I'm keeping an eye on this.


avatar
New Contributor

@Matt Burgess, @John T:

I got this working in Python, my first ever such program ,so it might be rough around the edges. The...

import zipfile
from org.apache.nifi.processor.io import InputStreamCallback

class ReadVersion(InputStreamCallback)
  def __init__(self):
    self.ff = None
    self.version = ''
    self.error = ''
  def process(self,inputStream):
    try:
      zipname = self.ff.getAttribute('filename')
      zippath = self.ff.getAttribute('absolute.path')
      zfile = zipfile.ZipFile(zippath+zipname)
      for name in zfile.namelist():
        if (name == 'docProps/app.xml'):
          inFile = zfile.open(name)
          inContents = infile.read()
          loc = inContents.find('<AppVersion>1')
          if (loc != -1):
            keyChar = inContents[loc+13:loc+14]
            if (keyChar == '2'):
              self.version = '2007'
            elif (keyChar == '4'):
              self.version = '2010'
            elif (keyChar == '5'):
              self.version = '2013'
            elif (keyChar == '6'):
              self.version = '2016'
            else:
              log.warn('Unexpected AppVersion value: ',inContents[loc+12:loc+14])
    except:
      log.warn('exception thrown (is this really a zip file?)')
      self.error = 'error'

ff = session.get()
if (ff != None):
  callback = ReadVersion()
  callback.ff = ff
  session.read(ff, callback)
  if (callback.version != ''):
    ff = session.putAttribute(ff,'MSVersion',callback.version)
    session.transfer(ff, REL_SUCCESS)
  if (callback.error == 'error'):
    session.transfer(ff, REL_FAILURE)