Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark 2 can't retrieve files from hdfs when using hive (timed out)

Spark 2 can't retrieve files from hdfs when using hive (timed out)

New Contributor

Hello, I am having some issues when trying to fetch data from hive using spark 2. In my classpath I have the hive.site.xml, core-site.xml and hdfs-site.xml.

If I try to access the parquet file directly from HDFS using the following code, it works:

val spark = SparkSession
      .builder()
      .master("local")
      .appName("myname")
      .enableHiveSupport()
      .getOrCreate()    //avoids linkage exception
    RuntimeDelegate.setInstance(new RuntimeDelegateImpl())
    spark.read.parquet("/PATH/partition_name=20180525/*").show(10)

The result:

+----------+--------------------+----------+---------+---------+-----------+
|      col1|                col2|      col3|     col4|     col5|       col6|
+----------+--------------------+----------+---------+---------+-----------+
|2018-05-24|ABC1234             |      ABC2|   ABC123|bbbbbbbbb|          0|
+----------+--------------------+----------+---------+---------+-----------+

But if I try to use spark sql to query HIVE:

spark.table("mydatabase.mytable").limit(1).show()

I have the following error:

Failed to connect to /184.6.241.44:1019 for block, add to deadNodes and continue. java.net.ConnectException: Connection timed out: no further information

I can query hive through Knox or using JDBC, so everything looks fine, but this. What could cause this problem ? Any idea ?

Thank you.

1 REPLY 1

Re: Spark 2 can't retrieve files from hdfs when using hive (timed out)

New Contributor

When I read the parquet file, I am using SWebHdfsFileSystem

When I try to connect using hive, after connected to the metastore, it returns me the path for the table, which is hdfs://PATH/partition_name=20180525.

As I am running the code on my machine, I don't have access to HDFS. Is there a way to override the file location we receive from the metastore and force it to use SWebHdfsFileSystem?

Don't have an account?
Coming from Hortonworks? Activate your account here