Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Solr - expected mime type application/octet-stream but got text/html

Solr - expected mime type application/octet-stream but got text/html

Expert Contributor

Hello

I am using the /update/extract request handler to push documents into solr. I am getting this error with certain types of documents. These documents are ended up being ignored by Solr.

I have discovered that these files are Emails (.msg) with zip files containing unsupported documents (im assuming). Is there a way to have solr ignore the zip file rather than ignoring the entire file itself?

Thanks

4 REPLIES 4
Highlighted

Re: Solr - expected mime type application/octet-stream but got text/html

Super Guru

This question has a "nifi-processor" tag, which NiFi processor are you using? Also which processor(s) are you using to get the email messages? I suspect you should be able to use RouteOnAttribute or RouteOnContent to send emails with ZIP attachments to some other relationship, and those without attachments can go directly to PutSolrContentStream (or whatever you're using to push data to Solr). Perhaps the branch with ZIP attachments can use processor(s) to remove the ZIP part of the attachment, retain the email message, and route back to the "main" branch to retry the "put".

Highlighted

Re: Solr - expected mime type application/octet-stream but got text/html

Expert Contributor

I'm using the PutSolrContentStream Processor. Solr is only failes on certain extension type (mdb for example). When an email or a zip file contains an mdb file, the entire document fails to get pushed to solr. Is there a way to have solr index the email or zip file and ignore only the unsupported extensions rather than ignoring the entire document?

Highlighted

Re: Solr - expected mime type application/octet-stream but got text/html

Explorer

I believe this is a known issue with .zip archives and the Solr ExtractingRequestHandler (aka Solr Cell): https://issues.apache.org/jira/browse/SOLR-2416. The short version of the story is that Tika in this case is not configured to parse the .zip recursively.

One of the other suggestions for NiFi processing may be worth exploring in this case.

Highlighted

Re: Solr - expected mime type application/octet-stream but got text/html

Expert Contributor

i have tried sending documents using Solr's rest api and i got the exact same error. The problem isn't with zip files. If a zip file contains pdf or word documents for example the zip is indexed well. However if the zip file contains an mdb file solr fails to index it. Is it possible to have solr ignore only the unsupported extensions rather than ignoring the entire document or file?

Don't have an account?
Coming from Hortonworks? Activate your account here