Created 09-28-2016 12:27 PM
I'm trying to use ConsumeImap -> ExtractEmailAttachments processors with google mail. And it gets messages but fails with: "Message failed RFC2822 validation".
From source code i see there should be from and sentDate headers set.
I was thinking something is wrong with gmail server and i wrote python code using imaplib and email and it works. This code is outside of nifi.
Does anyone have any experience with this problem or clue how to fix this.
Best
Bojan
Created 09-29-2016 11:50 AM
Created 09-28-2016 02:57 PM
If it is not sensitive, would you be able to provide an example of the flow file content that is failing?
You could get this from using provenance, or from routing the failure of ExtractEmailAttachments to a directory using PutFile.
Created 09-28-2016 03:28 PM
Hi Brian,
I get this from provenance:
--Apple-Mail=_851C8E02-DA29-4CEB-8309-895E2E5B1FB3 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Test body --Apple-Mail=_851C8E02-DA29-4CEB-8309-895E2E5B1FB3 Content-Disposition: attachment; filename=test.csv Content-Type: text/csv; name="test.csv" Content-Transfer-Encoding: quoted-printable foo;bar;;;;;=0D 1;2;;;;;=0D 3;4;;;;;=0D 5;6;;;;;=0D ;;;;;;=0D ;;;;;;=0D ;;;;;;=0D ;;;;;;=0D ;;;;;;=0D ;;;;;;=0D ;;;;;;=0D ;;;;;;=0D ;;;;;;=0D ;;;;;;=0D ;;;;;;=0D ;;;;;;=0D ;;;;;;=0D ;;;;;;=0D ;;;;;;=0D ;;;;;;=0D ;;;;;;=0D ;;;;;;= --Apple-Mail=_851C8E02-DA29-4CEB-8309-895E2E5B1FB3--
Created 09-29-2016 11:50 AM
I think i hit this bug: https://issues.apache.org/jira/browse/NIFI-2709?jql=text%20~%20%22ConsumeImap%22
Created 09-29-2016 12:39 PM
I had a feeling it was a problem with the output of ConumeIMAP, that JIRA definitely looks like what you are seeing. I'm glad we have already captured the issue, although sorry that it is causing you problems.
Created 09-29-2016 12:49 PM
No problem, i just need to create some hack. And good thing i know now how Email processors are working in NIFI(i read all the code).
Maybe i can still use ConsumeIMAP processor to watch for new messages and then maybe route to ExecuteScript which will run Python code which will extract attachment and pass to next processor? Only issue that i see is the how to detect new messages, i am not sure if session is persistent or something like that. I could mark them as read after i download them. I did not have much experience with emails. But i will try. If you have suggestion how to hack this please be free to suggest.
Created 09-29-2016 03:17 PM
I think what you suggested makes sense. I am not very familiar with these email processors, but if you are still using ConsueIMAP I think that would be handling getting the new messages and marking them as read, all your script would be doing is receiving a flow file with the message in it and parsing it like ExtractEmailAttachments was doing, but working around the missing headers.
Created 09-30-2016 09:40 AM
Hi, you can sure mark the message as read an adjust the python script to only read when needed. On a separate note, have you tested the POP3 processor? Many email providers like gmail, exchange, etc offer both protocols to user agents. Curious to know if the same issue happens with those as well. Cheers
Created 09-30-2016 02:33 PM
POP3 is also failing.
In the end i created python script using smptlib to get messages after consumeimap fires.
ConsumeImap -> ExecuteScript -> ExtractEmailAttachments
But i don't like this solution, two time i am download messages.
Created 09-30-2016 01:13 PM
@blood9raven, Thanks for your message, I have just added a patch attempting to solve the bug you hit, would be able to test it and let me know if it works?
The patch can be found on the JIRA page you linked previously Cheers