<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Error sending *.docx file from HDFS to SOLR in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-sending-docx-file-from-HDFS-to-SOLR/m-p/3857#M655</link>
    <description>&lt;P&gt;Hello, I have written java Map-only program for sending files from folder on HDFS to Solr. It is working fine for all files, except *.docx. (program v1).&lt;/P&gt;&lt;P&gt;I also modified this program to run as simple java program without hadoop. It takes files from ext3 filesystem and sends to Solr. (program v2).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Using program v2 I was able to send my *.docx file to solr, but if I put it in HDFS and start MapReduce program (v1) i can't index it. I get such error::&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@42e20459 at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:225) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1909) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:739) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:169) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.solr.servlet.ProxyUserFilter.doFilter(ProxyUserFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.solr.servlet.SolrHadoopAuthenticationFilter$2.doFilter(SolrHadoopAuthenticationFilter.java:140) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:384) at org.apache.solr.servlet.SolrHadoopAuthenticationFilter.doFilter(SolrHadoopAuthenticationFilter.java:145) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.solr.servlet.HostnameFilter.doFilter(HostnameFilter.java:86) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@42e20459 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) ... 30 more Caused by: org.apache.poi.openxml4j.exceptions.InvalidOperationException: Can't open the specified file: '/var/lib/solr/apache-tika-242864397475446795.tmp' at org.apache.poi.openxml4j.opc.ZipPackage.&amp;lt;init&amp;gt;(ZipPackage.java:103) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:207) at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:70) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) ... 33 more&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also in folder /var/lib/solr/ &amp;nbsp;there are no file apache-tika-242864397475446795.tmp&lt;/P&gt;&lt;P&gt;Any suggestions?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 08:51:09 GMT</pubDate>
    <dc:creator>Markovich</dc:creator>
    <dc:date>2022-09-16T08:51:09Z</dc:date>
    <item>
      <title>Error sending *.docx file from HDFS to SOLR</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-sending-docx-file-from-HDFS-to-SOLR/m-p/3857#M655</link>
      <description>&lt;P&gt;Hello, I have written java Map-only program for sending files from folder on HDFS to Solr. It is working fine for all files, except *.docx. (program v1).&lt;/P&gt;&lt;P&gt;I also modified this program to run as simple java program without hadoop. It takes files from ext3 filesystem and sends to Solr. (program v2).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Using program v2 I was able to send my *.docx file to solr, but if I put it in HDFS and start MapReduce program (v1) i can't index it. I get such error::&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@42e20459 at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:225) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1909) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:739) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:169) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.solr.servlet.ProxyUserFilter.doFilter(ProxyUserFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.solr.servlet.SolrHadoopAuthenticationFilter$2.doFilter(SolrHadoopAuthenticationFilter.java:140) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:384) at org.apache.solr.servlet.SolrHadoopAuthenticationFilter.doFilter(SolrHadoopAuthenticationFilter.java:145) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.solr.servlet.HostnameFilter.doFilter(HostnameFilter.java:86) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@42e20459 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) ... 30 more Caused by: org.apache.poi.openxml4j.exceptions.InvalidOperationException: Can't open the specified file: '/var/lib/solr/apache-tika-242864397475446795.tmp' at org.apache.poi.openxml4j.opc.ZipPackage.&amp;lt;init&amp;gt;(ZipPackage.java:103) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:207) at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:70) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) ... 33 more&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also in folder /var/lib/solr/ &amp;nbsp;there are no file apache-tika-242864397475446795.tmp&lt;/P&gt;&lt;P&gt;Any suggestions?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 08:51:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-sending-docx-file-from-HDFS-to-SOLR/m-p/3857#M655</guid>
      <dc:creator>Markovich</dc:creator>
      <dc:date>2022-09-16T08:51:09Z</dc:date>
    </item>
    <item>
      <title>Re: Error sending *.docx file from HDFS to SOLR</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-sending-docx-file-from-HDFS-to-SOLR/m-p/3863#M656</link>
      <description>&lt;P&gt;Resolved.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was using WholeFileInputFormat and used&amp;nbsp;value.getBytes() instead of&amp;nbsp;value.copyBytes().&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2013 15:51:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-sending-docx-file-from-HDFS-to-SOLR/m-p/3863#M656</guid>
      <dc:creator>Markovich</dc:creator>
      <dc:date>2013-12-10T15:51:43Z</dc:date>
    </item>
  </channel>
</rss>

