Created on 07-12-2023 09:00 AM - edited 07-12-2023 09:31 AM
Hello. I am trying to run explorative Pyspark code within a Databricks ApacheSpark environs. I am pretty sure this syntax is correct, but the subject-referenced apache.spark error keeps throwing. Any insights, please?
databaseName = "database"
desiredColumn = "variable"
database = spark.sql(f"show tables in {databaseName} ").collect()
display(database)
tablenames = []
for row in database:
cols = spark.table(row.tableName).columns
listColumns= spark.table(row.tableName).columns
if desiredColumn in listColumns:
tablenames.append(row.tableName)
Created 07-12-2023 11:40 AM
@JN_000 Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our Spark expert @Bharati who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Created 07-12-2023 12:02 PM
Thank you so much!
Created 07-20-2023 03:10 AM
We verified the same in the CDP environment, as we are uncertain about the Databricks Spark environment.
As we have mixed of managed and external tables , extracted the necessary information through HWC.
>>> database=spark.sql("show tables in default").collect()
23/07/20 10:04:45 INFO rule.HWCSwitchRule: Registering Listeners
23/07/20 10:04:47 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
Hive Session ID = e6f70006-0c2e-4237-9a9e-e1d19901af54
>>> desiredColumn="name"
>>> tablenames = []
>>> for row in database:
... cols = spark.table(row.tableName).columns
... listColumns= spark.table(row.tableName).columns
... if desiredColumn in listColumns:
... tablenames.append(row.tableName)
...
>>>
>>> print("\n".join(tablenames))
movies
tv_series_abc
cdp1
tv_series
spark_array_string_example
>>>