Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

RFC 2822 Validation error

avatar
Contributor

I'm trying to use ConsumeImap -> ExtractEmailAttachments processors with google mail. And it gets messages but fails with: "Message failed RFC2822 validation".

From source code i see there should be from and sentDate headers set.

I was thinking something is wrong with gmail server and i wrote python code using imaplib and email and it works. This code is outside of nifi.

Does anyone have any experience with this problem or clue how to fix this.

Best

Bojan

1 ACCEPTED SOLUTION

avatar
Contributor
12 REPLIES 12

avatar
Master Guru

If it is not sensitive, would you be able to provide an example of the flow file content that is failing?

You could get this from using provenance, or from routing the failure of ExtractEmailAttachments to a directory using PutFile.

avatar
Contributor

Hi Brian,

I get this from provenance:


--Apple-Mail=_851C8E02-DA29-4CEB-8309-895E2E5B1FB3
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	charset=us-ascii


Test body


--Apple-Mail=_851C8E02-DA29-4CEB-8309-895E2E5B1FB3
Content-Disposition: attachment;
	filename=test.csv
Content-Type: text/csv;
	name="test.csv"
Content-Transfer-Encoding: quoted-printable


foo;bar;;;;;=0D
1;2;;;;;=0D
3;4;;;;;=0D
5;6;;;;;=0D
;;;;;;=0D
;;;;;;=0D
;;;;;;=0D
;;;;;;=0D
;;;;;;=0D
;;;;;;=0D
;;;;;;=0D
;;;;;;=0D
;;;;;;=0D
;;;;;;=0D
;;;;;;=0D
;;;;;;=0D
;;;;;;=0D
;;;;;;=0D
;;;;;;=0D
;;;;;;=0D
;;;;;;=0D
;;;;;;=


--Apple-Mail=_851C8E02-DA29-4CEB-8309-895E2E5B1FB3--

avatar
Contributor

avatar
Master Guru

I had a feeling it was a problem with the output of ConumeIMAP, that JIRA definitely looks like what you are seeing. I'm glad we have already captured the issue, although sorry that it is causing you problems.

avatar
Contributor

No problem, i just need to create some hack. And good thing i know now how Email processors are working in NIFI(i read all the code).

Maybe i can still use ConsumeIMAP processor to watch for new messages and then maybe route to ExecuteScript which will run Python code which will extract attachment and pass to next processor? Only issue that i see is the how to detect new messages, i am not sure if session is persistent or something like that. I could mark them as read after i download them. I did not have much experience with emails. But i will try. If you have suggestion how to hack this please be free to suggest.

avatar
Master Guru

I think what you suggested makes sense. I am not very familiar with these email processors, but if you are still using ConsueIMAP I think that would be handling getting the new messages and marking them as read, all your script would be doing is receiving a flow file with the message in it and parsing it like ExtractEmailAttachments was doing, but working around the missing headers.

avatar

Hi, you can sure mark the message as read an adjust the python script to only read when needed. On a separate note, have you tested the POP3 processor? Many email providers like gmail, exchange, etc offer both protocols to user agents. Curious to know if the same issue happens with those as well. Cheers

avatar
Contributor

POP3 is also failing.

In the end i created python script using smptlib to get messages after consumeimap fires.

ConsumeImap -> ExecuteScript -> ExtractEmailAttachments

But i don't like this solution, two time i am download messages.

avatar

@blood9raven, Thanks for your message, I have just added a patch attempting to solve the bug you hit, would be able to test it and let me know if it works?

The patch can be found on the JIRA page you linked previously Cheers