and here's the output :
and here's hive output :
I want to connect to hive databases and thanks in advance
Created 03-15-2021 12:58 PM
Hello everyone I have a problem I'm trying to work with hive datasets using pyspark and I have 3 databases but I just get the default database it's like it create a new warehouse in same directory of python program Here's my program :
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Python Spark SQL Hive integration example") \
.getOrCreate()
spark.sql("show databases").show()
and here's the output :
and here's hive output :
I want to connect to hive databases and thanks in advance
Created 03-15-2021 01:36 PM
@totti1
This all about the HMS hive Metadata Refreshing Spark SQL caches Parquet metadata for better performance. When Hive metastore Parquet table conversion is enabled, metadata of those converted tables are also cached. If these tables are updated by Hive or other external tools, you need to refresh them manually to ensure consistent metadata.
from os.path import expanduser, join
from pyspark.sql import SparkSession
from pyspark.sql import Row
# warehouse_location points to the default location for managed databases and tables
warehouse_location = 'spark-warehouse'
spark = SparkSession \
.builder \
.appName("Python Spark SQL Hive integration example") \
.config("spark.sql.warehouse.dir", warehouse_location) \
.enableHiveSupport() \
.getOrCreate()
# spark is an existing SparkSession
spark.sql("CREATE TABLE IF NOT EXISTS totti (key INT, value STRING)")
# Load some data here
spark.sql("LOAD DATA LOCAL INPATH 'path/to/the/table/totti.txt' INTO TABLE totti")
# Refresh the HMS metastore
// spark is an existing SparkSession
spark.catalog.refreshTable("totti")
# Queries are expressed in HiveQL
spark.sql("SELECT * FROM totti").show()
In the above example, you will need to connect to the database to create the table totti. Notice I run the refresh before the select so that the Metadata is invalidated and fetched from the databases else I will get no table found etc
Created 03-15-2021 01:48 PM
Thank you for your reply
I don't want to use spark warehouse, I want to use hive warehouse the global hive