Created 02-26-2018 10:19 PM
I am creating some code to inspect a zip file with multiple sub folders and I want to look through the files in the zip and then pass the original content.
I am trying to do this without writing the file to disk.
Any ideas for how I could do this all in ram? or the nifi content repositories?
I also used a guide to write the code in groovy and I was wondering how I could pass original content, in groovy?
John
Created 02-26-2018 10:32 PM
What kinds of operations are you trying to perform on the files in the ZIP?
Created 02-26-2018 10:46 PM
@Matt Burgess I want to look at an xml doc inside the zip within a directory in the zip. And grab a value of a tag. The thing is I want the original flowfile to pass along with the tag value.
Thanks Matt!
Created 02-27-2018 12:34 AM
You know the path of the XML doc? I'm still looking at memory vs temp disk storage, if you can fill in this blank I hope to have an answer for you (in Groovy probably lol) tomorrow 🙂
Created 03-05-2018 08:35 PM
Hey @Matt Burgess just wondering if you had a chance to point me in the right direction. 🙂 I appreciate the help!
Created 02-27-2018 08:33 PM
Yeah! the path is ./docProps/app.xml Its taking apart a docx/xlsx/pptx document and looking at the tag that tells the version of office it was created with. Any office document is really just a zip file. Hence what I'm trying to do 🙂 Thanks for your help!
Created 03-05-2018 07:29 PM
Doing a trick like this to determine the MS version of a file would also be useful to me in my Nifi flow, so I'm keeping an eye on this.
Created 03-09-2018 06:12 PM
import zipfile
from org.apache.nifi.processor.io import InputStreamCallback
class ReadVersion(InputStreamCallback)
def __init__(self):
self.ff = None
self.version = ''
self.error = ''
def process(self,inputStream):
try:
zipname = self.ff.getAttribute('filename')
zippath = self.ff.getAttribute('absolute.path')
zfile = zipfile.ZipFile(zippath+zipname)
for name in zfile.namelist():
if (name == 'docProps/app.xml'):
inFile = zfile.open(name)
inContents = infile.read()
loc = inContents.find('<AppVersion>1')
if (loc != -1):
keyChar = inContents[loc+13:loc+14]
if (keyChar == '2'):
self.version = '2007'
elif (keyChar == '4'):
self.version = '2010'
elif (keyChar == '5'):
self.version = '2013'
elif (keyChar == '6'):
self.version = '2016'
else:
log.warn('Unexpected AppVersion value: ',inContents[loc+12:loc+14])
except:
log.warn('exception thrown (is this really a zip file?)')
self.error = 'error'
ff = session.get()
if (ff != None):
callback = ReadVersion()
callback.ff = ff
session.read(ff, callback)
if (callback.version != ''):
ff = session.putAttribute(ff,'MSVersion',callback.version)
session.transfer(ff, REL_SUCCESS)
if (callback.error == 'error'):
session.transfer(ff, REL_FAILURE)