Support Questions
Find answers, ask questions, and share your expertise

Issue indexing html files using nifi and PutSolrContentStream

Solved Go to solution

Issue indexing html files using nifi and PutSolrContentStream

Explorer

I'm having trouble streaming html files into solr. I have a GetFile processor that gets html files from local disk and connects to PutSolrContentStream, but I am getting JSON parse error in the PutSolrContentStream processor. I have tried changing content-type value to "text/html" or "text" and is still getting the same error.

How can I resolve this issue?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Issue indexing html files using nifi and PutSolrContentStream

For this question you have to first take NiFi out of the picture and think about how you would index HTML with Solr.

HTML is not typically one of the standard input formats like JSON, XML, and CSV, but Solr has an "extracting request handler" which is capable of handling HTML, see this page:

https://wiki.apache.org/solr/ExtractingRequestHandler

To use that from NiFi you need to set the "Content Stream Path" to "/update/extract", set the "Content Type" to "text/html", and add a user defined property for "literal.id" and set it to some id (you can use the FlowFile uuid by setting it to ${uuid}).

2911-nifi-solr-extract.png

View solution in original post

2 REPLIES 2

Re: Issue indexing html files using nifi and PutSolrContentStream

For this question you have to first take NiFi out of the picture and think about how you would index HTML with Solr.

HTML is not typically one of the standard input formats like JSON, XML, and CSV, but Solr has an "extracting request handler" which is capable of handling HTML, see this page:

https://wiki.apache.org/solr/ExtractingRequestHandler

To use that from NiFi you need to set the "Content Stream Path" to "/update/extract", set the "Content Type" to "text/html", and add a user defined property for "literal.id" and set it to some id (you can use the FlowFile uuid by setting it to ${uuid}).

2911-nifi-solr-extract.png

View solution in original post

Re: Issue indexing html files using nifi and PutSolrContentStream

Explorer

Thank you! @bbende