Support Questions
Find answers, ask questions, and share your expertise

Unable to access the hive table from spark-sql

Hi Team,

I am unable to access any of the hive table from spark-sql terminal but able to list the databases and table from spark terminal.

looks like the spark-sql does not able to find the hdfs name space. Kindly look into the below error.

i.e datalake dev is the hdfs name space

spark-sql> show tables;
19/12/31 02:39:46 INFO CodeGenerator: Code generated in 24.580491 ms
product adjustment_type false
product adjustment_type_ext false
product co_financing_arrangement_type false
product co_financing_arrangement_type_ext false

 

19/12/31 02:39:57 ERROR SparkSQLDriver: Failed in [select * from snapshot_table_list limit 10]
java.lang.IllegalArgumentException: java.net.UnknownHostException: datalakedev
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:445)
at org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:132)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:353)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:287)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:177)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.ancestorIsMetadataDirectory(FileStreamSink.scala:68)
at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$1.apply(InMemoryFileIndex.scala:61)
at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$1.apply(InMemoryFileIndex.scala:61)
at scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at scala.collection.TraversableLike$class.filterNot(TraversableLike.scala:267)
at scala.collection.AbstractTraversable.filterNot(Traversable.scala:104)
at org.apache.spark.sql.execution.datasources.InMemoryFileIndex.<init>(InMemoryFileIndex.scala:61)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$9.apply(HiveMetastoreCatalog.scala:235)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$9.apply(HiveMetastoreCatalog.scala:233)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.org$apache$spark$sql$hive$HiveMetastoreCatalog$$inferIfNeeded(HiveMetastoreCatalog.scala:233)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$6$$anonfun$7.apply(HiveMetastoreCatalog.scala:193)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$6$$anonfun$7.apply(HiveMetastoreCatalog.scala:192)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$6.apply(HiveMetastoreCatalog.scala:192)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$6.apply(HiveMetastoreCatalog.scala:185)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.withTableCreationLock(HiveMetastoreCatalog.scala:54)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.convertToLogicalRelation(HiveMetastoreCatalog.scala:185)
at org.apache.spark.sql.hive.RelationConversions.org$apache$spark$sql$hive$RelationConversions$$convert(HiveStrategies.scala:212)
at org.apache.spark.sql.hive.RelationConversions$$anonfun$apply$4.applyOrElse(HiveStrategies.scala:239)
at org.apache.spark.sql.hive.RelationConversions$$anonfun$apply$4.applyOrElse(HiveStrategies.scala:228)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)

 

8 REPLIES 8

Re: Unable to access the hive table from spark-sql

@Shelton even from pyspark also not able to access the hive table. could you please look into this.

 

>>> sqlContext.sql('select * from project.relationship_type_ext limit 10').show()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/hdp/current/spark2-client/python/pyspark/sql/context.py", line 353, in sql
return self.sparkSession.sql(sqlQuery)
File "/usr/hdp/current/spark2-client/python/pyspark/sql/session.py", line 716, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u'java.net.UnknownHostException: datalakedev'
>>> sqlContext.sql('describe table project.relationship_type_ext').show()
+--------------------+---------+-------+
| col_name|data_type|comment|
+--------------------+---------+-------+
| uuid| string| null|
| source| string| null|
| sourceobject| string| null|
|securityclassific...| string| null|
|timestampwithmill...| string| null|
| activeflag| string| null|
| versionofgenerator| string| null|
| updated_by| string| null|
| active_ind| string| null|
|relationship_type...| string| null|
|relationship_type...| string| null|
|relationship_type...| string| null|
| horizontal_ind| string| null|
| created_by| string| null|
| created_on| string| null|
| updated_on| string| null|
+--------------------+---------+-------+

Re: Unable to access the hive table from spark-sql

Cloudera Employee

Hello

 

It looks like Hadoop configuration files are not passed to Spark. So please check if hdfs-site and core-site xml files are passed to sprk-sql properly

Re: Unable to access the hive table from spark-sql

@senthh I have passed the hive-site.xml in /etc/spark2/conf but don't know how to map other configuration files. And also, in test environment it is working fine whereas in dev alone causing problem.

Re: Unable to access the hive table from spark-sql

Cloudera Employee
Could you please try passing /etc/hadoop/conf( where hadoon conf files are
residing into HADOOP_CONF_DIR and then submit Spark job?
{{

export HADOOP_CONF_DIR=“/etc/hadoop/conf:/etc/hive/conf

}}

Re: Unable to access the hive table from spark-sql

@senthh exported the same but still the issue is persisting.

>>> sqlContext.sql('select * from project.relationship_type_ext limit 10').show()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/hdp/current/spark2-client/python/pyspark/sql/context.py", line 353, in sql
return self.sparkSession.sql(sqlQuery)
File "/usr/hdp/current/spark2-client/python/pyspark/sql/session.py", line 716, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u'java.net.UnknownHostException: datalakedev'
>>>
Traceback (most recent call last):
File "/usr/hdp/current/spark2-client/python/pyspark/context.py", line 256, in signal_handler
raise KeyboardInterrupt()
KeyboardInterrupt
>>>
[1]+ Stopped pyspark

Re: Unable to access the hive table from spark-sql

@senthh seems like spark doesn't recognize the hdfs namespace 

raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u'java.net.UnknownHostException: datalakedev'

Re: Unable to access the hive table from spark-sql

@Shelton Could you please look into the above issue 

Re: Unable to access the hive table from spark-sql

@Shelton do u have any idea on this getting unknown exception while accessing the query from spark sql 

Time taken: 0.216 seconds, Fetched 318 row(s)
20/01/06 09:16:49 INFO SparkSQLCLIDriver: Time taken: 0.216 seconds, Fetched 318 row(s)
spark-sql> select * from snapshot_table_list;
20/01/06 09:16:57 INFO ContextCleaner: Cleaned accumulator 0
20/01/06 09:16:57 INFO ContextCleaner: Cleaned accumulator 1
20/01/06 09:16:57 INFO ContextCleaner: Cleaned accumulator 2
20/01/06 09:16:58 INFO HiveMetastoreCatalog: Inferring case-sensitive schema for table project.snapshot_table_list_ext (inference mode: INFER_AND_SAVE)
20/01/06 09:16:58 INFO deprecation: No unit for dfs.client.datanode-restart.timeout(30) assuming SECONDS
20/01/06 09:16:58 ERROR SparkSQLDriver: Failed in [select * from snapshot_table_list]
java.lang.IllegalArgumentException: java.net.UnknownHostException: datalakedev
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:445)
at org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:132)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:353)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:287)