Support Questions
Find answers, ask questions, and share your expertise

Unable to access the hive table from spark-sql

Hi Team,

I am unable to access any of the hive table from spark-sql terminal but able to list the databases and table from spark terminal.

looks like the spark-sql does not able to find the hdfs name space. Kindly look into the below error.

i.e datalake dev is the hdfs name space

spark-sql> show tables;
19/12/31 02:39:46 INFO CodeGenerator: Code generated in 24.580491 ms
product adjustment_type false
product adjustment_type_ext false
product co_financing_arrangement_type false
product co_financing_arrangement_type_ext false

 

19/12/31 02:39:57 ERROR SparkSQLDriver: Failed in [select * from snapshot_table_list limit 10]
java.lang.IllegalArgumentException: java.net.UnknownHostException: datalakedev
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:445)
at org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:132)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:353)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:287)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:177)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.ancestorIsMetadataDirectory(FileStreamSink.scala:68)
at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$1.apply(InMemoryFileIndex.scala:61)
at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$1.apply(InMemoryFileIndex.scala:61)
at scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at scala.collection.TraversableLike$class.filterNot(TraversableLike.scala:267)
at scala.collection.AbstractTraversable.filterNot(Traversable.scala:104)
at org.apache.spark.sql.execution.datasources.InMemoryFileIndex.<init>(InMemoryFileIndex.scala:61)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$9.apply(HiveMetastoreCatalog.scala:235)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$9.apply(HiveMetastoreCatalog.scala:233)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.org$apache$spark$sql$hive$HiveMetastoreCatalog$$inferIfNeeded(HiveMetastoreCatalog.scala:233)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$6$$anonfun$7.apply(HiveMetastoreCatalog.scala:193)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$6$$anonfun$7.apply(HiveMetastoreCatalog.scala:192)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$6.apply(HiveMetastoreCatalog.scala:192)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$6.apply(HiveMetastoreCatalog.scala:185)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.withTableCreationLock(HiveMetastoreCatalog.scala:54)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.convertToLogicalRelation(HiveMetastoreCatalog.scala:185)
at org.apache.spark.sql.hive.RelationConversions.org$apache$spark$sql$hive$RelationConversions$$convert(HiveStrategies.scala:212)
at org.apache.spark.sql.hive.RelationConversions$$anonfun$apply$4.applyOrElse(HiveStrategies.scala:239)
at org.apache.spark.sql.hive.RelationConversions$$anonfun$apply$4.applyOrElse(HiveStrategies.scala:228)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)

 

8 REPLIES 8

@Shelton even from pyspark also not able to access the hive table. could you please look into this.

 

>>> sqlContext.sql('select * from project.relationship_type_ext limit 10').show()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/hdp/current/spark2-client/python/pyspark/sql/context.py", line 353, in sql
return self.sparkSession.sql(sqlQuery)
File "/usr/hdp/current/spark2-client/python/pyspark/sql/session.py", line 716, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u'java.net.UnknownHostException: datalakedev'
>>> sqlContext.sql('describe table project.relationship_type_ext').show()
+--------------------+---------+-------+
| col_name|data_type|comment|
+--------------------+---------+-------+
| uuid| string| null|
| source| string| null|
| sourceobject| string| null|
|securityclassific...| string| null|
|timestampwithmill...| string| null|
| activeflag| string| null|
| versionofgenerator| string| null|
| updated_by| string| null|
| active_ind| string| null|
|relationship_type...| string| null|
|relationship_type...| string| null|
|relationship_type...| string| null|
| horizontal_ind| string| null|
| created_by| string| null|
| created_on| string| null|
| updated_on| string| null|
+--------------------+---------+-------+

Cloudera Employee

Hello

 

It looks like Hadoop configuration files are not passed to Spark. So please check if hdfs-site and core-site xml files are passed to sprk-sql properly

@senthh I have passed the hive-site.xml in /etc/spark2/conf but don't know how to map other configuration files. And also, in test environment it is working fine whereas in dev alone causing problem.

Cloudera Employee
Could you please try passing /etc/hadoop/conf( where hadoon conf files are
residing into HADOOP_CONF_DIR and then submit Spark job?
{{

export HADOOP_CONF_DIR=“/etc/hadoop/conf:/etc/hive/conf

}}

@senthh exported the same but still the issue is persisting.

>>> sqlContext.sql('select * from project.relationship_type_ext limit 10').show()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/hdp/current/spark2-client/python/pyspark/sql/context.py", line 353, in sql
return self.sparkSession.sql(sqlQuery)
File "/usr/hdp/current/spark2-client/python/pyspark/sql/session.py", line 716, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u'java.net.UnknownHostException: datalakedev'
>>>
Traceback (most recent call last):
File "/usr/hdp/current/spark2-client/python/pyspark/context.py", line 256, in signal_handler
raise KeyboardInterrupt()
KeyboardInterrupt
>>>
[1]+ Stopped pyspark

@senthh seems like spark doesn't recognize the hdfs namespace 

raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u'java.net.UnknownHostException: datalakedev'

@Shelton Could you please look into the above issue 

@Shelton do u have any idea on this getting unknown exception while accessing the query from spark sql 

Time taken: 0.216 seconds, Fetched 318 row(s)
20/01/06 09:16:49 INFO SparkSQLCLIDriver: Time taken: 0.216 seconds, Fetched 318 row(s)
spark-sql> select * from snapshot_table_list;
20/01/06 09:16:57 INFO ContextCleaner: Cleaned accumulator 0
20/01/06 09:16:57 INFO ContextCleaner: Cleaned accumulator 1
20/01/06 09:16:57 INFO ContextCleaner: Cleaned accumulator 2
20/01/06 09:16:58 INFO HiveMetastoreCatalog: Inferring case-sensitive schema for table project.snapshot_table_list_ext (inference mode: INFER_AND_SAVE)
20/01/06 09:16:58 INFO deprecation: No unit for dfs.client.datanode-restart.timeout(30) assuming SECONDS
20/01/06 09:16:58 ERROR SparkSQLDriver: Failed in [select * from snapshot_table_list]
java.lang.IllegalArgumentException: java.net.UnknownHostException: datalakedev
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:445)
at org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:132)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:353)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:287)