Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Solr - expected mime type application/octet-stream but got text/html

avatar
Expert Contributor

Hello

I am using the /update/extract request handler to push documents into solr. I am getting this error with certain types of documents. These documents are ended up being ignored by Solr.

I have discovered that these files are Emails (.msg) with zip files containing unsupported documents (im assuming). Is there a way to have solr ignore the zip file rather than ignoring the entire file itself?

Thanks

4 REPLIES 4

avatar
Master Guru

This question has a "nifi-processor" tag, which NiFi processor are you using? Also which processor(s) are you using to get the email messages? I suspect you should be able to use RouteOnAttribute or RouteOnContent to send emails with ZIP attachments to some other relationship, and those without attachments can go directly to PutSolrContentStream (or whatever you're using to push data to Solr). Perhaps the branch with ZIP attachments can use processor(s) to remove the ZIP part of the attachment, retain the email message, and route back to the "main" branch to retry the "put".

avatar
Expert Contributor

I'm using the PutSolrContentStream Processor. Solr is only failes on certain extension type (mdb for example). When an email or a zip file contains an mdb file, the entire document fails to get pushed to solr. Is there a way to have solr index the email or zip file and ignore only the unsupported extensions rather than ignoring the entire document?

avatar
Contributor

I believe this is a known issue with .zip archives and the Solr ExtractingRequestHandler (aka Solr Cell): https://issues.apache.org/jira/browse/SOLR-2416. The short version of the story is that Tika in this case is not configured to parse the .zip recursively.

One of the other suggestions for NiFi processing may be worth exploring in this case.

avatar
Expert Contributor

i have tried sending documents using Solr's rest api and i got the exact same error. The problem isn't with zip files. If a zip file contains pdf or word documents for example the zip is indexed well. However if the zip file contains an mdb file solr fails to index it. Is it possible to have solr ignore only the unsupported extensions rather than ignoring the entire document or file?