Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hexdump Nifi Processor?

avatar
Rising Star

I need a way to get the hexdump of a file using nifi. I have used an executestreamcommand processor already and it work's but uses a lot of processing power to write each file to a file system. Is there a processor that could achieve this? or does anyone have a custom nar I could use to extract the hexdump of a file? Also I only need the first 16 bits of the file.

Thank you!!

1 ACCEPTED SOLUTION

avatar
Master Guru

You could use the ExecuteScript processor if you are comfortable with Groovy, Javascript, Jython, JRuby, or Lua. Here's an example of a Groovy script that I think will do what you're asking:

import java.io.DataInputStream
def flowFile = session.get()
if(!flowFile) return
def attr = ''
session.read(flowFile, {inputStream ->
   dis = new DataInputStream(inputStream)
   attr = Integer.toHexString(dis.readUnsignedShort())
} as InputStreamCallback)
flowFile = session.putAttribute(flowFile, 'first16hex', attr)
session.transfer(flowFile, REL_SUCCESS)

This maintains the content in the flow file but adds an attribute called 'first16hex' that contains a string representation of the first 16 bits of the incoming flow file content.

Please let me know if I've misunderstood anything here, and I will try to help. I should mention that a full hexdump processor could be helpful, feel free to raise a Jira for this feature.

View solution in original post

8 REPLIES 8

avatar
Master Guru

You could use the ExecuteScript processor if you are comfortable with Groovy, Javascript, Jython, JRuby, or Lua. Here's an example of a Groovy script that I think will do what you're asking:

import java.io.DataInputStream
def flowFile = session.get()
if(!flowFile) return
def attr = ''
session.read(flowFile, {inputStream ->
   dis = new DataInputStream(inputStream)
   attr = Integer.toHexString(dis.readUnsignedShort())
} as InputStreamCallback)
flowFile = session.putAttribute(flowFile, 'first16hex', attr)
session.transfer(flowFile, REL_SUCCESS)

This maintains the content in the flow file but adds an attribute called 'first16hex' that contains a string representation of the first 16 bits of the incoming flow file content.

Please let me know if I've misunderstood anything here, and I will try to help. I should mention that a full hexdump processor could be helpful, feel free to raise a Jira for this feature.

avatar
Rising Star

Hi Matt!

I now understand what you meant by using the executescript processor!

How would I edit this code to allow me to capture more of the hex output? right now I only get the first 4 characters of hex.

Thank you!

avatar
Rising Star
import java.io.DataInputStream
def flowFile = session.get()
if(!flowFile) return
def attr = ''
session.read(flowFile, {inputStream ->
   dis = new DataInputStream(inputStream)
   attr = Long.toHexString(dis.readLong())
   
} as InputStreamCallback)
flowFile = session.putAttribute(flowFile, 'first16hex', attr)
session.transfer(flowFile, REL_SUCCESS)

Hi Matt, I modified the code to use long instead of short and it gets the first 16 hex bits. How would I get the first 36 Hex bits?

avatar
Master Guru

You'd need multiple calls to dis.readXYZ(), call toHexString() on each, then concatenate before storing in the "attr" or whatever value will be the result (going into the first16hex attribute). For 36 Hex characters, it's probably two dis.readLongs() followed by a dis.readUnsignedShort().

avatar
Rising Star

Hi Matt,

I got it to work with some help from a teammate!

See the code below:

import java.io.DataInputStream
def flowFile = session.get()
if(!flowFile) return
def attr = ''
session.read(flowFile, {inputStream ->
   dis = new DataInputStream(inputStream)
   attr = Long.toHexString(dis.readLong())
   attr2 = Long.toHexString(dis.readLong())
   
} as InputStreamCallback)
flowFile = session.putAttribute(flowFile, 'first16hex', attr+attr2)
session.transfer(flowFile, REL_SUCCESS)

avatar
Rising Star

A source of Hex Filetype headers:

http://www.garykessler.net/library/file_sigs.html

avatar
Master Guru

http://www.tutorialspoint.com/unix_commands/hexdump.htm could be called from executestreamcommand

avatar

Hey guys,

I know this is an old post and the original question was only about the first 16 bits hex dump. However, since I came across this post and it gave me some directions, I've tried to improve it considering the following things:

1) In case the flowfile has, say, 93 bytes, using Long or UnsignedShort allows only to shift 8 or 4 bytes at the time, making it impossible to read an odd number of bytes. Thus, I've used readUnsignedByte instead.

2) Missing left padding zeros

3) Reading the whole flowfile and dumping it in an attribute, which I've called 'raw'

import java.io.DataInputStream
def flowFile = session.get()
if(!flowFile) return
def raw = ''
def aux = ''
boolean eof = false
session.read(flowFile, {inputStream ->
  dis = new DataInputStream(inputStream)
  while (!eof) {
    try {
       aux = Integer.toHexString(dis.readUnsignedByte());
       raw = raw + aux.padLeft(2,'0');
    } catch (EOFException e) {
        eof = true;
    }
  }
} as InputStreamCallback)
flowFile = session.putAttribute(flowFile, 'raw', raw)
session.transfer(flowFile, REL_SUCCESS)

As I am a newbie, please feel free to comment.

Cheers,

Gus