<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Read data from Kudu via Spark in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Read-data-from-Kudu-via-Spark/m-p/64104#M73999</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;I did not found any documents describing how the spark tasks are assigned to which exectur when data is read from Kudu into dataframes. I noticed, that in some cases (did not have enough time to test) Spark reads data ONLY from the Leaders of the tablets, so moving data across network.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there any setting or configuration for co-locate the spark task in an executor with a Kudu tablet?&lt;/P&gt;&lt;P&gt;Based on the Kudu documentation, the LEADER is for write, but the FOLLOWERs can server reads too..&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 12:47:59 GMT</pubDate>
    <dc:creator>Tomas79</dc:creator>
    <dc:date>2022-09-16T12:47:59Z</dc:date>
    <item>
      <title>Read data from Kudu via Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Read-data-from-Kudu-via-Spark/m-p/64104#M73999</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;I did not found any documents describing how the spark tasks are assigned to which exectur when data is read from Kudu into dataframes. I noticed, that in some cases (did not have enough time to test) Spark reads data ONLY from the Leaders of the tablets, so moving data across network.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there any setting or configuration for co-locate the spark task in an executor with a Kudu tablet?&lt;/P&gt;&lt;P&gt;Based on the Kudu documentation, the LEADER is for write, but the FOLLOWERs can server reads too..&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 12:47:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Read-data-from-Kudu-via-Spark/m-p/64104#M73999</guid>
      <dc:creator>Tomas79</dc:creator>
      <dc:date>2022-09-16T12:47:59Z</dc:date>
    </item>
    <item>
      <title>Re: Read data from Kudu via Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Read-data-from-Kudu-via-Spark/m-p/64114#M74000</link>
      <description>&lt;P&gt;The Spark bindings for Kudu 1.5 and below would only scan the leader. Starting in Kudu 1.6, the Spark bindings will have an option to specify scans from the closest replica. This was filed as &lt;A href="https://issues.apache.org/jira/browse/KUDU-1454" target="_self"&gt;KUDU-1454&lt;/A&gt; and&amp;nbsp;the change was merged as &lt;A href="https://github.com/apache/kudu/commit/9f26c9d15" target="_self"&gt;commit 9f26c9d15&lt;/A&gt; and &lt;A href="https://github.com/apache/kudu/commit/3abca98c5" target="_self"&gt;commit 3abca98c5&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 29 Jan 2018 19:15:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Read-data-from-Kudu-via-Spark/m-p/64114#M74000</guid>
      <dc:creator>awong</dc:creator>
      <dc:date>2018-01-29T19:15:53Z</dc:date>
    </item>
  </channel>
</rss>

