About rams

AutoIN · ‎05-01-2018

@rams the error is correct as the syntax in pyspark varies from that of scala. For reference here are the steps that you'd need to query a kudu table in pyspark2 Create a kudu table using impala-shell # impala-shell CREATE TABLE test_kudu (id BIGINT PRIMARY KEY, s STRING) PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU; insert into test_kudu values (100, 'abc'); insert into test_kudu values (101, 'def'); insert into test_kudu values (102, 'ghi'); Launch pyspark2 with the artifacts and query the kudu table # pyspark2 --packages org.apache.kudu:kudu-spark2_2.11:1.4.0 ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.1.0.cloudera3-SNAPSHOT /_/ Using Python version 2.7.5 (default, Nov 6 2016 00:28:07) SparkSession available as 'spark'. >>> kuduDF = spark.read.format('org.apache.kudu.spark.kudu').option('kudu.master',"nightly512-1.xxx.xxx.com:7051").option('kudu.table',"impala::default.test_kudu").load() >>> kuduDF.show(3) +---+---+ | id| s| +---+---+ |100|abc| |101|def| |102|ghi| +---+---+ For records, the same thing can be achieved using the following commands in spark2-shell # spark2-shell --packages org.apache.kudu:kudu-spark2_2.11:1.4.0 Spark context available as 'sc' (master = yarn, app id = application_1525159578660_0011). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.1.0.cloudera3-SNAPSHOT scala> import org.apache.kudu.spark.kudu._ import org.apache.kudu.spark.kudu._ scala> val df = spark.sqlContext.read.options(Map("kudu.master" -> "nightly512-1.xx.xxx.com:7051","kudu.table" -> "impala::default.test_kudu")).kudu scala> df.show(3) +---+---+ | id| s| +---+---+ |100|abc| |101|def| |102|ghi| +---+---+

Online	Offline
Last Visited	‎01-23-2019 03:11 PM

Member Since	‎04-12-2018 04:10 AM
Last Visited	‎01-23-2019 03:11 PM
Posts	10

Cloudera Community

Re: How do you connect to Kudu via PySpark