Hive supports the ANSI-standard information_schema database, which we can query for information about tables, views, columns, and our Hive privileges. The information_schema data reveals the state of the system, similar to sys database data, but in a user-friendly, read-only way.
Example:
SELECT * FROM information_schema.tables WHERE is_insertable_into='YES' limit 2;
...
+--------------------+-------------------+-----------------
|tables.table_catalog|tables.table_schema|tables.table_name
+--------------------+-------------------+-----------------
|default |default |students2
|default |default |t3
Now we will try to access the following table under the information_schema database.
spark.sql("select * from information_schema.schemata").show()
We will get the following exception:
org.apache.spark.sql.AnalysisException: Undefined function: 'restrict_information_schema'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 632
We will get the above exception because in spark we don't have restrict_information_schema function and in Hive it is available. We can check the available functions using the following command:
spark.sql("show functions").show()
We can solve the above error by passing hive-exec.jar and by creating a temporary function.
spark-shell --jars /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/lib/hive/lib/hive-exec.jar
spark.sql("""
CREATE TEMPORARY FUNCTION restrict_information_schema AS
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFRestrictInformationSchema'
""")
After registering the function if we try to access the table data we will another error like below:
scala> spark.sql("select * from information_schema.schemata").show()
org.apache.spark.sql.AnalysisException: Undefined function: 'current_user'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 632
Difficult to find out how many function(s) we need to register.
To avoid registering functions, we can use the Spark JDBC API to read the tables under information_schema.