<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Issue: Spark-Solr Connection Stalling with No Errors on Execution - CDP 7.1.9 in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Issue-Spark-Solr-Connection-Stalling-with-No-Errors-on/m-p/396368#M249401</link>
    <description>&lt;P&gt;We're attempting to execute a basic Spark job to read/write data from Solr, using the following environment:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;CDP version:&lt;/STRONG&gt; 7.1.9&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Spark:&lt;/STRONG&gt; Spark3&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Solr:&lt;/STRONG&gt; 8.11&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Spark-Solr Connector:&lt;/STRONG&gt; opt/cloudera/parcels/SPARK3/lib/spark3/spark-solr/spark-solr-3.9.3000.3.3.7191000.0-78-shaded.jar&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;When we try to interact with Solr through Spark, the execution process hangs indefinitely, without any errors or results. Other components, such as Hive and HBase, integrate smoothly with Spark, and we’re using a valid Kerberos ticket that successfully authenticates with other Hadoop components. Additionally, we’ve tested REST API calls to Solr via both curl and Python’s requests library, and we’re able to retrieve data with the Kerberos ticket.&lt;/P&gt;&lt;P&gt;The problem appears isolated to Spark’s connection with Solr, as all other systems interact as expected. Has anyone experienced a similar issue or have ideas on what might be causing this?&lt;/P&gt;&lt;P&gt;Here’s the Spark code we’re trying:&lt;/P&gt;&lt;P&gt;solr_options = {&lt;BR /&gt;"zkhost": "zkURL-01.orgis.ie:2181,zkURL-02.orgis.ie:2181,zkURL.orgis.ie:2181/solr",&lt;BR /&gt;"collection": "collection_phoectic_test2"&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;# Read data from Solr&lt;BR /&gt;df = spark.read.format("solr").options(**solr_options).load()&lt;BR /&gt;df.show()&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Interestingly, if I specify a non-existent Solr collection, I get an error stating that the collection doesn’t exist. This leads me to believe that Zookeeper is managing the initial connection, as it has the metadata for the Solr collections. However, it seems the Spark executor might be connecting to Zookeeper but failing to establish a connection between Spark executor nodes and Solr nodes.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Additional Details:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;The Spark UI logs (stderr) do not provide much insight, and I’m looking for any common troubleshooting steps or configurations that might resolve this.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;If anyone has suggestions or has resolved a similar issue, please let me know. Thank you!&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="sde_20241_0-1730040904403.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/42361i063A512A7080540C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="sde_20241_0-1730040904403.png" alt="sde_20241_0-1730040904403.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sun, 27 Oct 2024 14:56:13 GMT</pubDate>
    <dc:creator>sde_20241</dc:creator>
    <dc:date>2024-10-27T14:56:13Z</dc:date>
    <item>
      <title>Issue: Spark-Solr Connection Stalling with No Errors on Execution - CDP 7.1.9</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Issue-Spark-Solr-Connection-Stalling-with-No-Errors-on/m-p/396368#M249401</link>
      <description>&lt;P&gt;We're attempting to execute a basic Spark job to read/write data from Solr, using the following environment:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;CDP version:&lt;/STRONG&gt; 7.1.9&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Spark:&lt;/STRONG&gt; Spark3&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Solr:&lt;/STRONG&gt; 8.11&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Spark-Solr Connector:&lt;/STRONG&gt; opt/cloudera/parcels/SPARK3/lib/spark3/spark-solr/spark-solr-3.9.3000.3.3.7191000.0-78-shaded.jar&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;When we try to interact with Solr through Spark, the execution process hangs indefinitely, without any errors or results. Other components, such as Hive and HBase, integrate smoothly with Spark, and we’re using a valid Kerberos ticket that successfully authenticates with other Hadoop components. Additionally, we’ve tested REST API calls to Solr via both curl and Python’s requests library, and we’re able to retrieve data with the Kerberos ticket.&lt;/P&gt;&lt;P&gt;The problem appears isolated to Spark’s connection with Solr, as all other systems interact as expected. Has anyone experienced a similar issue or have ideas on what might be causing this?&lt;/P&gt;&lt;P&gt;Here’s the Spark code we’re trying:&lt;/P&gt;&lt;P&gt;solr_options = {&lt;BR /&gt;"zkhost": "zkURL-01.orgis.ie:2181,zkURL-02.orgis.ie:2181,zkURL.orgis.ie:2181/solr",&lt;BR /&gt;"collection": "collection_phoectic_test2"&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;# Read data from Solr&lt;BR /&gt;df = spark.read.format("solr").options(**solr_options).load()&lt;BR /&gt;df.show()&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Interestingly, if I specify a non-existent Solr collection, I get an error stating that the collection doesn’t exist. This leads me to believe that Zookeeper is managing the initial connection, as it has the metadata for the Solr collections. However, it seems the Spark executor might be connecting to Zookeeper but failing to establish a connection between Spark executor nodes and Solr nodes.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Additional Details:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;The Spark UI logs (stderr) do not provide much insight, and I’m looking for any common troubleshooting steps or configurations that might resolve this.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;If anyone has suggestions or has resolved a similar issue, please let me know. Thank you!&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="sde_20241_0-1730040904403.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/42361i063A512A7080540C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="sde_20241_0-1730040904403.png" alt="sde_20241_0-1730040904403.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 27 Oct 2024 14:56:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Issue-Spark-Solr-Connection-Stalling-with-No-Errors-on/m-p/396368#M249401</guid>
      <dc:creator>sde_20241</dc:creator>
      <dc:date>2024-10-27T14:56:13Z</dc:date>
    </item>
    <item>
      <title>Re: Issue: Spark-Solr Connection Stalling with No Errors on Execution - CDP 7.1.9</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Issue-Spark-Solr-Connection-Stalling-with-No-Errors-on/m-p/397509#M249875</link>
      <description>&lt;P&gt;&amp;nbsp;What is the SOLR logs talking at the time client connecting to the SOLR server, if the client is hanged then where it hanged did it achive the authorisation from application logs did we confirmed that&lt;/P&gt;</description>
      <pubDate>Thu, 14 Nov 2024 11:54:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Issue-Spark-Solr-Connection-Stalling-with-No-Errors-on/m-p/397509#M249875</guid>
      <dc:creator>Asfahan</dc:creator>
      <dc:date>2024-11-14T11:54:14Z</dc:date>
    </item>
    <item>
      <title>Re: Issue: Spark-Solr Connection Stalling with No Errors on Execution - CDP 7.1.9</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Issue-Spark-Solr-Connection-Stalling-with-No-Errors-on/m-p/397813#M249986</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/119649"&gt;@sde_20241&lt;/a&gt;,&amp;nbsp;Did the response assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future. However, if you still have concerns, please provide the information that&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/79963"&gt;@Asfahan&lt;/a&gt;&amp;nbsp; has requested.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Nov 2024 07:31:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Issue-Spark-Solr-Connection-Stalling-with-No-Errors-on/m-p/397813#M249986</guid>
      <dc:creator>VidyaSargur</dc:creator>
      <dc:date>2024-11-21T07:31:12Z</dc:date>
    </item>
  </channel>
</rss>

