Support Questions

johnmteabo · ‎02-26-2018

I am creating some code to inspect a zip file with multiple sub folders and I want to look through the files in the zip and then pass the original content.

I am trying to do this without writing the file to disk.

Any ideas for how I could do this all in ram? or the nifi content repositories?

I also used a guide to write the code in groovy and I was wondering how I could pass original content, in groovy?

John

mburgess · ‎02-26-2018

What kinds of operations are you trying to perform on the files in the ZIP?

johnmteabo · ‎02-26-2018

@Matt Burgess I want to look at an xml doc inside the zip within a directory in the zip. And grab a value of a tag. The thing is I want the original flowfile to pass along with the tag value.

Thanks Matt!

mburgess · ‎02-27-2018

You know the path of the XML doc? I'm still looking at memory vs temp disk storage, if you can fill in this blank I hope to have an answer for you (in Groovy probably lol) tomorrow 🙂

johnmteabo · ‎03-05-2018

Hey @Matt Burgess just wondering if you had a chance to point me in the right direction. 🙂 I appreciate the help!

johnmteabo · ‎02-27-2018

Yeah! the path is ./docProps/app.xml Its taking apart a docx/xlsx/pptx document and looking at the tag that tells the version of office it was created with. Any office document is really just a zip file. Hence what I'm trying to do 🙂 Thanks for your help!

hodgkinsonjeffr · ‎03-05-2018

@Matt Burgess

Doing a trick like this to determine the MS version of a file would also be useful to me in my Nifi flow, so I'm keeping an eye on this.

hodgkinsonjeffr · ‎03-09-2018

@Matt Burgess, @John T:

I got this working in Python, my first ever such program ,so it might be rough around the edges. The...

import zipfile
from org.apache.nifi.processor.io import InputStreamCallback

class ReadVersion(InputStreamCallback)
  def __init__(self):
    self.ff = None
    self.version = ''
    self.error = ''
  def process(self,inputStream):
    try:
      zipname = self.ff.getAttribute('filename')
      zippath = self.ff.getAttribute('absolute.path')
      zfile = zipfile.ZipFile(zippath+zipname)
      for name in zfile.namelist():
        if (name == 'docProps/app.xml'):
          inFile = zfile.open(name)
          inContents = infile.read()
          loc = inContents.find('<AppVersion>1')
          if (loc != -1):
            keyChar = inContents[loc+13:loc+14]
            if (keyChar == '2'):
              self.version = '2007'
            elif (keyChar == '4'):
              self.version = '2010'
            elif (keyChar == '5'):
              self.version = '2013'
            elif (keyChar == '6'):
              self.version = '2016'
            else:
              log.warn('Unexpected AppVersion value: ',inContents[loc+12:loc+14])
    except:
      log.warn('exception thrown (is this really a zip file?)')
      self.error = 'error'

ff = session.get()
if (ff != None):
  callback = ReadVersion()
  callback.ff = ff
  session.read(ff, callback)
  if (callback.version != ''):
    ff = session.putAttribute(ff,'MSVersion',callback.version)
    session.transfer(ff, REL_SUCCESS)
  if (callback.error == 'error'):
    session.transfer(ff, REL_FAILURE)

Cloudera Community

Support Questions

Unzip files in ExecuteScript NiFi processor

Nifi : Implement Sleep Mechanism in nifi without E...

Nifi ExecuteScript processor Python example ?

NIFI custom processor return multipleflow files in...

Using PutHiveStreaming processor in NiFi

manipulate flowfile with executescript processor i...

Data flow enrichment with NiFi part 1 : LookupReco...

NIFI processor state

How to merge multiple HDFS files using Nifi Proces...

Using HiveQL Processors in Apache NiFi 1.2

Replace ConsumeASB Processor with the InvokeHttp P...