<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How do you connect to Kudu via PySpark in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-connect-to-Kudu-via-PySpark/m-p/66854#M77765</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/26757"&gt;@rams&lt;/a&gt;&amp;nbsp;the error is correct as the syntax in pyspark&amp;nbsp;varies&amp;nbsp;from that of&amp;nbsp;scala.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For reference here are the steps that you'd need to query a kudu table in pyspark2&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Create a kudu table using impala-shell&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;# impala-shell&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;CREATE TABLE test_kudu (id BIGINT PRIMARY KEY, s STRING)&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;insert into test_kudu values (100, 'abc');&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;insert into test_kudu values (101, 'def');&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;insert into test_kudu values (102, 'ghi');&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Launch pyspark2 with the artifacts and query the kudu table&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;# pyspark2 --packages org.apache.kudu:kudu-spark2_2.11:1.4.0&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;____ __&lt;BR /&gt;/ __/__ ___ _____/ /__&lt;BR /&gt;_\ \/ _ \/ _ `/ __/ '_/&lt;BR /&gt;/__ / .__/\_,_/_/ /_/\_\ version 2.1.0.cloudera3-SNAPSHOT&lt;BR /&gt;/_/&lt;/P&gt;&lt;P&gt;Using Python version 2.7.5 (default, Nov 6 2016 00:28:07)&lt;BR /&gt;SparkSession available as 'spark'.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&amp;gt;&amp;gt;&amp;gt; kuduDF = spark.read.format('org.apache.kudu.spark.kudu').option('kudu.master',"nightly512-1.xxx.xxx.com:7051").option('kudu.table',"impala::default.test_kudu").load()&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;gt;&amp;gt;&amp;gt; kuduDF.show(3)&lt;/P&gt;&lt;P&gt;+---+---+&lt;BR /&gt;| id| s|&lt;BR /&gt;+---+---+&lt;BR /&gt;|100|abc|&lt;BR /&gt;|101|def|&lt;BR /&gt;|102|ghi|&lt;BR /&gt;+---+---+&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;For records, the same thing can be achieved using the following commands in spark2-shell&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;# spark2-shell --packages org.apache.kudu:kudu-spark2_2.11:1.4.0&lt;/P&gt;&lt;P&gt;Spark context available as 'sc' (master = yarn, app id = application_1525159578660_0011).&lt;BR /&gt;Spark session available as 'spark'.&lt;BR /&gt;Welcome to&lt;BR /&gt;____ __&lt;BR /&gt;/ __/__ ___ _____/ /__&lt;BR /&gt;_\ \/ _ \/ _ `/ __/ '_/&lt;BR /&gt;/___/ .__/\_,_/_/ /_/\_\ version 2.1.0.cloudera3-SNAPSHOT&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;scala&amp;gt; import org.apache.kudu.spark.kudu._&lt;BR /&gt;import org.apache.kudu.spark.kudu._&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;scala&amp;gt; val df = spark.sqlContext.read.options(Map("kudu.master" -&amp;gt; "nightly512-1.xx.xxx.com:7051","kudu.table" -&amp;gt; "impala::default.test_kudu")).kudu&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;scala&amp;gt; df.show(3)&lt;/P&gt;&lt;P&gt;+---+---+&lt;BR /&gt;| id| s|&lt;BR /&gt;+---+---+&lt;BR /&gt;|100|abc|&lt;BR /&gt;|101|def|&lt;BR /&gt;|102|ghi|&lt;BR /&gt;+---+---+&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 01 May 2018 12:19:15 GMT</pubDate>
    <dc:creator>AutoIN</dc:creator>
    <dc:date>2018-05-01T12:19:15Z</dc:date>
    <item>
      <title>How do you connect to Kudu via PySpark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-connect-to-Kudu-via-PySpark/m-p/66765#M77764</link>
      <description>&lt;P&gt;Trying to create a dataframe like so&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;kuduOptions = {"kudu.master":"my.master.server", "kudu.table":"myTable"}&lt;/P&gt;&lt;P&gt;df = sqlContext.read.options(kuduOptions).kudu&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The above code is a "port" of Scala code. Scala sample had kuduOptions defined as map.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I get an error stating "options expecting 1 parameter but was given 2"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How do you connect to Kudu via PySpark SQL Context?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 13:09:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-connect-to-Kudu-via-PySpark/m-p/66765#M77764</guid>
      <dc:creator>rams</dc:creator>
      <dc:date>2022-09-16T13:09:03Z</dc:date>
    </item>
    <item>
      <title>Re: How do you connect to Kudu via PySpark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-connect-to-Kudu-via-PySpark/m-p/66854#M77765</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/26757"&gt;@rams&lt;/a&gt;&amp;nbsp;the error is correct as the syntax in pyspark&amp;nbsp;varies&amp;nbsp;from that of&amp;nbsp;scala.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For reference here are the steps that you'd need to query a kudu table in pyspark2&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Create a kudu table using impala-shell&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;# impala-shell&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;CREATE TABLE test_kudu (id BIGINT PRIMARY KEY, s STRING)&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;insert into test_kudu values (100, 'abc');&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;insert into test_kudu values (101, 'def');&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;insert into test_kudu values (102, 'ghi');&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Launch pyspark2 with the artifacts and query the kudu table&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;# pyspark2 --packages org.apache.kudu:kudu-spark2_2.11:1.4.0&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;____ __&lt;BR /&gt;/ __/__ ___ _____/ /__&lt;BR /&gt;_\ \/ _ \/ _ `/ __/ '_/&lt;BR /&gt;/__ / .__/\_,_/_/ /_/\_\ version 2.1.0.cloudera3-SNAPSHOT&lt;BR /&gt;/_/&lt;/P&gt;&lt;P&gt;Using Python version 2.7.5 (default, Nov 6 2016 00:28:07)&lt;BR /&gt;SparkSession available as 'spark'.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&amp;gt;&amp;gt;&amp;gt; kuduDF = spark.read.format('org.apache.kudu.spark.kudu').option('kudu.master',"nightly512-1.xxx.xxx.com:7051").option('kudu.table',"impala::default.test_kudu").load()&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;gt;&amp;gt;&amp;gt; kuduDF.show(3)&lt;/P&gt;&lt;P&gt;+---+---+&lt;BR /&gt;| id| s|&lt;BR /&gt;+---+---+&lt;BR /&gt;|100|abc|&lt;BR /&gt;|101|def|&lt;BR /&gt;|102|ghi|&lt;BR /&gt;+---+---+&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;For records, the same thing can be achieved using the following commands in spark2-shell&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;# spark2-shell --packages org.apache.kudu:kudu-spark2_2.11:1.4.0&lt;/P&gt;&lt;P&gt;Spark context available as 'sc' (master = yarn, app id = application_1525159578660_0011).&lt;BR /&gt;Spark session available as 'spark'.&lt;BR /&gt;Welcome to&lt;BR /&gt;____ __&lt;BR /&gt;/ __/__ ___ _____/ /__&lt;BR /&gt;_\ \/ _ \/ _ `/ __/ '_/&lt;BR /&gt;/___/ .__/\_,_/_/ /_/\_\ version 2.1.0.cloudera3-SNAPSHOT&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;scala&amp;gt; import org.apache.kudu.spark.kudu._&lt;BR /&gt;import org.apache.kudu.spark.kudu._&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;scala&amp;gt; val df = spark.sqlContext.read.options(Map("kudu.master" -&amp;gt; "nightly512-1.xx.xxx.com:7051","kudu.table" -&amp;gt; "impala::default.test_kudu")).kudu&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;scala&amp;gt; df.show(3)&lt;/P&gt;&lt;P&gt;+---+---+&lt;BR /&gt;| id| s|&lt;BR /&gt;+---+---+&lt;BR /&gt;|100|abc|&lt;BR /&gt;|101|def|&lt;BR /&gt;|102|ghi|&lt;BR /&gt;+---+---+&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 01 May 2018 12:19:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-connect-to-Kudu-via-PySpark/m-p/66854#M77765</guid>
      <dc:creator>AutoIN</dc:creator>
      <dc:date>2018-05-01T12:19:15Z</dc:date>
    </item>
  </channel>
</rss>

