Support Questions
Find answers, ask questions, and share your expertise

Unzip files in ExecuteScript NiFi processor

Explorer

I am creating some code to inspect a zip file with multiple sub folders and I want to look through the files in the zip and then pass the original content.

I am trying to do this without writing the file to disk.

Any ideas for how I could do this all in ram? or the nifi content repositories?

I also used a guide to write the code in groovy and I was wondering how I could pass original content, in groovy?

John

7 REPLIES 7

Super Guru

What kinds of operations are you trying to perform on the files in the ZIP?

Explorer

@Matt Burgess I want to look at an xml doc inside the zip within a directory in the zip. And grab a value of a tag. The thing is I want the original flowfile to pass along with the tag value.

Thanks Matt!

Super Guru

You know the path of the XML doc? I'm still looking at memory vs temp disk storage, if you can fill in this blank I hope to have an answer for you (in Groovy probably lol) tomorrow 🙂

Explorer

Hey @Matt Burgess just wondering if you had a chance to point me in the right direction. 🙂 I appreciate the help!

Explorer

Yeah! the path is ./docProps/app.xml Its taking apart a docx/xlsx/pptx document and looking at the tag that tells the version of office it was created with. Any office document is really just a zip file. Hence what I'm trying to do 🙂 Thanks for your help!

New Contributor

@Matt Burgess

Doing a trick like this to determine the MS version of a file would also be useful to me in my Nifi flow, so I'm keeping an eye on this.


New Contributor

@Matt Burgess, @John T:

I got this working in Python, my first ever such program ,so it might be rough around the edges. The...

import zipfile
from org.apache.nifi.processor.io import InputStreamCallback

class ReadVersion(InputStreamCallback)
  def __init__(self):
    self.ff = None
    self.version = ''
    self.error = ''
  def process(self,inputStream):
    try:
      zipname = self.ff.getAttribute('filename')
      zippath = self.ff.getAttribute('absolute.path')
      zfile = zipfile.ZipFile(zippath+zipname)
      for name in zfile.namelist():
        if (name == 'docProps/app.xml'):
          inFile = zfile.open(name)
          inContents = infile.read()
          loc = inContents.find('<AppVersion>1')
          if (loc != -1):
            keyChar = inContents[loc+13:loc+14]
            if (keyChar == '2'):
              self.version = '2007'
            elif (keyChar == '4'):
              self.version = '2010'
            elif (keyChar == '5'):
              self.version = '2013'
            elif (keyChar == '6'):
              self.version = '2016'
            else:
              log.warn('Unexpected AppVersion value: ',inContents[loc+12:loc+14])
    except:
      log.warn('exception thrown (is this really a zip file?)')
      self.error = 'error'

ff = session.get()
if (ff != None):
  callback = ReadVersion()
  callback.ff = ff
  session.read(ff, callback)
  if (callback.version != ''):
    ff = session.putAttribute(ff,'MSVersion',callback.version)
    session.transfer(ff, REL_SUCCESS)
  if (callback.error == 'error'):
    session.transfer(ff, REL_FAILURE)