Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How do you parse emails ingested with ConsumePOP3

How do you parse emails ingested with ConsumePOP3

Super Guru

ExtractEmailHeaders and ExtractEmailAttachments don't seem to work and they don't parse the body.

Any suggestions.

This is the content:

------=_Part_64_67671596.1476907348247
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
PGh0bWw+PGhlYWQ+PHRpdGxlPk5JRkkgTk9USUZJQ0FUSU9OPC90aXRsZT48L2hlYWQ+PGJvZHk+Cjxicj4KPGI+TWVzc2FnZSBGcm9tIE5JRkk8L2I+Cjxicj48YnI+Cgo8YnI+PGJyPgo8aW1nIHNyYz0iIj4KPGJyPjxicj4KCkZyb206ICBAIAo8YnI+CgoKTG9jYXRlZCBhdDoKIC8gIC8gIC8gCjxicj48YnI+CgoKSGFzaFRhZ3M6ICAKPGJyPjxicj4KCgpVc2luZyA8YnI+ClBvc3RlZCBhdCAgCjwvYm9keT4KPC9odG1sPgoKLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KClN0YW5kYXJkIEZsb3dGaWxlIE1ldGFkYXRhOgoJaWQgPSAnOGMwNjU4NzQtOWEyNi00NmU5LWJjYTAtNTQyNmMzMmQwNmEzJwoJZW50cnlEYXRlID0gJ1dlZCBPY3QgMTkgMTk6NTM6NTUgVVRDIDIwMTYnCglmaWxlU2l6ZSA9ICc3NzIxJwpGbG93RmlsZSBBdHRyaWJ1dGVzOgoJcGF0aCA9ICcvdHdpdHRlcicKCWZpbGVuYW1lID0gJzk4NTI1MDg3ODUxODUyNS5qc29uJwoJaGRmcy5vd25lciA9ICdyb290JwoJaGRmcy5sZW5ndGggPSAnNzcyMScKCXV1aWQgPSAnOGMwNjU4NzQtOWEyNi00NmU5LWJjYTAtNTQyNmMzMmQwNmEzJwoJaGRmcy5sYXN0TW9kaWZpZWQgPSAnMTQ3NjE5MzM5OTMxOCcKCWhkZnMucmVwbGljYXRpb24gPSAnMycKCWhkZnMuZ3JvdXAgPSAnaGRmcycKCWhkZnMucGVybWlzc2lvbnMgPSAncnctci0tci0tJwo=
------=_Part_64_67671596.1476907348247
Content-Type: application/octet-stream; name=985250878518525.json
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename=985250878518525.json
{"created_at":"Sat Oct 08 02:41:10 +0000 2016","id":784584393222418433,"id_=
str":"784584393222418433","text":"RT @innova_scape: RT  The Future of Kaggl=
e & Data Science: Quora Session ... - https:\/\/t.co\/k31TwhY3Fi\u00a0\=
u2026 #machinelearning #IoT #AI\u2026\u2026 ","source":"\u003ca href=3D\"ht=
tps:\/\/roundteam.co\" rel=3D\"nofollow\"\u003eRoundTeam\u003c\/a\u003e","t=
runcated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":nu=
ll,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_s=
creen_name":null,"user":{"id":2416199407,"id_str":"2416199407","name":"Alex=
 Barreto","screen_name":"shakamunyi","location":"San Francisco, CA","url":"=
http:\/\/shakamunyi.tumblr.com","description":"OpenStacker, Dockerite, Clou=
d and Big Data hands-on SME. Raspberry Pi enthusiast. Making Open Hybrid Cl=
oud for Data Processing a reality since 2007.","protected":false,"verified"=
:false,"followers_count":3408,"friends_count":1581,"listed_count":3492,"fav=
ourites_count":1321,"statuses_count":117022,"created_at":"Fri Mar 28 17:01:=
44 +0000 2014","utc_offset":null,"time_zone":null,"geo_enabled":true,"lang"=
:"en","contributors_enabled":false,"is_translator":false,"profile_backgroun=
d_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/i=
mages\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:=
\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile=
":false,"profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEE=
D","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","pro=
file_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com=
\/profile_images\/598640478657978368\/PgSHeCsJ_normal.jpg","profile_image_u=
rl_https":"https:\/\/pbs.twimg.com\/profile_images\/598640478657978368\/PgS=
HeCsJ_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_ba=
nners\/2416199407\/1423463300","default_profile":true,"default_profile_imag=
e":false,"following":null,"follow_request_sent":null,"notifications":null},=
"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_s=
tatus":{"created_at":"Sat Oct 08 02:10:07 +0000 2016","id":7845765782903234=
57,"id_str":"784576578290323457","text":"RT  The Future of Kaggle & Dat=
a Science: Quora Session ... - https:\/\/t.co\/k31TwhY3Fi\u00a0\u2026 #mach=
inelearning #IoT #AI\u2026\u2026 https:\/\/t.co\/zXB8KB5Gpa","display_text_=
range":[0,140],"source":"\u003ca href=3D\"http:\/\/dlvr.it\" rel=3D\"nofoll=
ow\"\u003edlvr.it\u003c\/a\u003e","truncated":true,"in_reply_to_status_id":=
null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_=
to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":3214647434=
,"id_str":"3214647434","name":"InnovaScape","screen_name":"innova_scape","l=
ocation":"Madrid, Spain","url":"https:\/\/www.facebook.com\/groups\/innovas=
cape\/","description":"following  innovation in the digital economy landsca=
pe. Chief Editor @dparente","protected":false,"verified":false,"followers_c=
ount":3166,"friends_count":56,"listed_count":4922,"favourites_count":446,"s=
tatuses_count":344440,"created_at":"Sun May 17 18:57:24 +0000 2015","utc_of=
fset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_e=
nabled":false,"is_translator":false,"profile_background_color":"C0DEED","pr=
ofile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1=
\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/im=
ages\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link=
_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_f=
ill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_i=
mage":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/737=
931514835505152\/4WnxZ5n-_normal.jpg","profile_image_url_https":"https:\/\/=
pbs.twimg.com\/profile_images\/737931514835505152\/4WnxZ5n-_normal.jpg","pr=
ofile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/3214647434\/14=
64771523","default_profile":true,"default_profile_image":false,"following":=
null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinat=
es":null,"place":null,"contributors":null,"is_quote_status":false,"extended=
_tweet":{"full_text":"RT  The Future of Kaggle & Data Science: Quora Se=
ssion ... - https:\/\/t.co\/k31TwhY3Fi\u00a0\u2026 #machinelearning #IoT #A=
I\u2026 https:\/\/t.co\/T80i55WdL4 https:\/\/t.co\/GG56OvZyEG","display_tex=
t_range":[0,141],"entities":{"hashtags":[{"text":"machinelearning","indices=
":[91,107]},{"text":"IoT","indices":[108,112]},{"text":"AI","indices":[113,=
116]}],"urls":[{"url":"https:\/\/t.co\/k31TwhY3Fi","expanded_url":"http:\/\=
/abunchofdata.com\/the-future-of-kaggle-data-science-quora-session-highligh=
ts-with-anthony-goldbloom-kaggle-ceo\/","display_url":"abunchofdata.com\/th=
e-future-of-\u2026","indices":[65,88]},{"url":"https:\/\/t.co\/T80i55WdL4",=
"expanded_url":"http:\/\/dlvr.it\/MQ5jtS","display_url":"dlvr.it\/MQ5jtS","=
indices":[118,141]}],"user_mentions":[],"symbols":[],"media":[{"id":7845765=
74871965697,"id_str":"784576574871965697","indices":[142,165],"media_url":"=
http:\/\/pbs.twimg.com\/media\/CuNgPOxWEAE0lZs.jpg","media_url_https":"http=
s:\/\/pbs.twimg.com\/media\/CuNgPOxWEAE0lZs.jpg","url":"https:\/\/t.co\/GG5=
6OvZyEG","display_url":"pic.twitter.com\/GG56OvZyEG","expanded_url":"https:=
\/\/twitter.com\/innova_scape\/status\/784576578290323457\/photo\/1","type"=
:"photo","sizes":{"medium":{"w":1000,"h":280,"resize":"fit"},"thumb":{"w":1=
50,"h":150,"resize":"crop"},"small":{"w":680,"h":190,"resize":"fit"},"large=
":{"w":1000,"h":280,"resize":"fit"}}}]},"extended_entities":{"media":[{"id"=
:784576574871965697,"id_str":"784576574871965697","indices":[142,165],"medi=
a_url":"http:\/\/pbs.twimg.com\/media\/CuNgPOxWEAE0lZs.jpg","media_url_http=
s":"https:\/\/pbs.twimg.com\/media\/CuNgPOxWEAE0lZs.jpg","url":"https:\/\/t=
.co\/GG56OvZyEG","display_url":"pic.twitter.com\/GG56OvZyEG","expanded_url"=
:"https:\/\/twitter.com\/innova_scape\/status\/784576578290323457\/photo\/1=
","type":"photo","sizes":{"medium":{"w":1000,"h":280,"resize":"fit"},"thumb=
":{"w":150,"h":150,"resize":"crop"},"small":{"w":680,"h":190,"resize":"fit"=
},"large":{"w":1000,"h":280,"resize":"fit"}}}]}},"retweet_count":1,"favorit=
e_count":0,"entities":{"hashtags":[{"text":"machinelearning","indices":[91,=
107]},{"text":"IoT","indices":[108,112]},{"text":"AI","indices":[113,116]}]=
,"urls":[{"url":"https:\/\/t.co\/k31TwhY3Fi","expanded_url":"http:\/\/abunc=
hofdata.com\/the-future-of-kaggle-data-science-quora-session-highlights-wit=
h-anthony-goldbloom-kaggle-ceo\/","display_url":"abunchofdata.com\/the-futu=
re-of-\u2026","indices":[65,88]},{"url":"https:\/\/t.co\/zXB8KB5Gpa","expan=
ded_url":"https:\/\/twitter.com\/i\/web\/status\/784576578290323457","displ=
ay_url":"twitter.com\/i\/web\/status\/7\u2026","indices":[119,142]}],"user_=
mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_se=
nsitive":false,"filter_level":"low","lang":"en"},"is_quote_status":false,"r=
etweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"machine=
learning","indices":[109,125]},{"text":"IoT","indices":[126,130]},{"text":"=
AI","indices":[131,134]}],"urls":[{"url":"https:\/\/t.co\/k31TwhY3Fi","expa=
nded_url":"http:\/\/abunchofdata.com\/the-future-of-kaggle-data-science-quo=
ra-session-highlights-with-anthony-goldbloom-kaggle-ceo\/","display_url":"a=
bunchofdata.com\/the-future-of-\u2026","indices":[83,106]},{"url":"","expan=
ded_url":null,"indices":[137,137]}],"user_mentions":[{"screen_name":"innova=
_scape","name":"InnovaScape","id":3214647434,"id_str":"3214647434","indices=
":[3,16]}],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sens=
itive":false,"filter_level":"low","lang":"en","timestamp_ms":"1475894470874=
"}
------=_Part_64_67671596.1476907348247--

Base 64 and Split didn't work.

7 REPLIES 7

Re: How do you parse emails ingested with ConsumePOP3

@Timothy Spann first decode the BASE64 encoded content using the Base64EncodeContent processor.

10548-screen-shot-2016-12-20-at-53139-pm.gif

Re: How do you parse emails ingested with ConsumePOP3

Super Guru

that did not work. does anyone have an email flow working from Consume to final parse?

Re: How do you parse emails ingested with ConsumePOP3

Super Guru

this time when run with base 64 decoder

06:00 UTC ERROR 1ee8d5b4-0159-1000-20e9-ae4b881a3057
Base64EncodeContent[id=1ee8d5b4-0159-1000-20e9-ae4b881a3057] Failed to decode StandardFlowFileRecord[uuid=264e03af-1550-45e8-9d19-b5afc2ed2463,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1482282331069-2, container=default, section=2], offset=563941, length=4389],offset=0,name=7373139426427741,size=4389] due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from Base64EncodeContent[id=1ee8d5b4-0159-1000-20e9-ae4b881a3057]: java.io.IOException: Data is not base64 encoded.: org.apache.nifi.processor.exception.ProcessException: IOException thrown from Base64EncodeContent[id=1ee8d5b4-0159-1000-20e9-ae4b881a3057]: java.io.IOException: Data is not base64 encoded.
01:06:00 UTC ERROR 1ee8d5b4-0159-1000-20e9-ae4b881a3057

Re: How do you parse emails ingested with ConsumePOP3

Super Guru

the message needs to be split into pieces before decoding.

Re: How do you parse emails ingested with ConsumePOP3

Super Guru

if i go from consumepop3 to extractemailattachments

01:45:18 UTC
ERROR
d7a61806-43b1-11a0-35a8-2f802acb3d87


ExtractEmailAttachments[id=d7a61806-43b1-11a0-35a8-2f802acb3d87] Could not parse the flowfile StandardFlowFileRecord[uuid=f226e4f4-b4b8-492c-a478-1e60359c56db,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1482284705893-3, container=default, section=3], offset=710117, length=8883],offset=0,name=7375497940614763,size=8883] as an email, treating as failure: com.sun.mail.util.DecodingException: BASE64Decoder: Error in encoded stream: needed at least 2 valid base64 characters, but only got 0 before padding character (=), the 10 most recent characters were: "K\r\n------="

01:45:18 UTC
ERROR
d7a61806-43b1-11a0-35a8-2f802acb3d87


ExtractEmailAttachments[id=d7a61806-43b1-11a0-35a8-2f802acb3d87] Could not parse the flowfile StandardFlowFileRecord[uuid=caad55c0-edfc-449e-81a8-fb689fb46c4e,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1482284705893-3, container=default, section=3], offset=719000, length=6692],offset=0,name=7375497941107842,size=6692] as an email, treating as failure: com.sun.mail.util.DecodingException: BASE64Decoder: Error in encoded stream: needed at least 2 valid base64 characters, but only got 0 before padding character (=), the 10 most recent characters were: "K\r\n------="

01:45:19 UTC
ERROR
d7a61806-43b1-11a0-35a8-2f802acb3d87


ExtractEmailAttachments[id=d7a61806-43b1-11a0-35a8-2f802acb3d87] Could not parse the flowfile StandardFlowFileRecord[uuid=a84605a6-21b0-4810-bbe6-299e231141b8,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1482284705893-3, container=default, section=3], offset=725692, length=9370],offset=0,name=7375497941561748,size=9370] as an email, treating as failure: com.sun.mail.util.DecodingException: BASE64Decoder: Error in encoded stream: needed at least 2 valid base64 characters, but only got 0 before padding character (=), the 10 most recent characters were: "=\r\n------="

01:45:19 UTC
ERROR
d7a61806-43b1-11a0-35a8-2f802acb3d87


ExtractEmailAttachments[id=d7a61806-43b1-11a0-35a8-2f802acb3d87] Could not parse the flowfile StandardFlowFileRecord[uuid=7208d574-9f06-47fb-92e6-86eab1548781,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1482284705893-3, container=default, section=3], offset=735062, length=6764],offset=0,name=7375497942074927,size=6764] as an email, treating as failure: com.sun.mail.util.DecodingException: BASE64Decoder: Error in encoded stream: needed at least 2 valid base64 characters, but only got 0 before padding character (=), the 10 most recent characters were: "K\r\n------="

01:45:19 UTC
ERROR
d7a61806-43b1-11a0-35a8-2f802acb3d87


ExtractEmailAttachments[id=d7a61806-43b1-11a0-35a8-2f802acb3d87] Could not parse the flowfile StandardFlowFileRecord[uuid=a48f072d-0be1-45a2-8001-07ffa617c849,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1482284705893-3, container=default, section=3], offset=741826, length=6477],offset=0,name=7375497942614747,size=6477] as an email, treating as failure: com.sun.mail.util.DecodingException: BASE64Decoder: Error in encoded stream: needed at least 2 valid base64 characters, but only got 0 before padding character (=), the 10 most recent characters were: "K\r\n------="

Re: How do you parse emails ingested with ConsumePOP3

Super Guru

POP3 is not getting any of the header or TO/From information

Don't have an account?
Coming from Hortonworks? Activate your account here