Member since
07-27-2018
4
Posts
0
Kudos Received
0
Solutions
08-09-2018
08:16 PM
When I read the parquet file, I am using SWebHdfsFileSystem When I try to connect using hive, after connected to the metastore, it returns me the path for the table, which is hdfs://PATH/partition_name=20180525. As I am running the code on my machine, I don't have access to HDFS. Is there a way to override the file location we receive from the metastore and force it to use SWebHdfsFileSystem?
... View more
08-08-2018
07:41 PM
Hello, I am having some issues when trying to fetch data from hive using spark 2. In my classpath I have the hive.site.xml, core-site.xml and hdfs-site.xml. If I try to access the parquet file directly from HDFS using the following code, it works: val spark = SparkSession
.builder()
.master("local")
.appName("myname")
.enableHiveSupport()
.getOrCreate() //avoids linkage exception
RuntimeDelegate.setInstance(new RuntimeDelegateImpl())
spark.read.parquet("/PATH/partition_name=20180525/*").show(10) The result: +----------+--------------------+----------+---------+---------+-----------+
| col1| col2| col3| col4| col5| col6|
+----------+--------------------+----------+---------+---------+-----------+
|2018-05-24|ABC1234 | ABC2| ABC123|bbbbbbbbb| 0|
+----------+--------------------+----------+---------+---------+-----------+ But if I try to use spark sql to query HIVE: spark.table("mydatabase.mytable").limit(1).show() I have the following error: Failed to connect to /184.6.241.44:1019 for block, add to deadNodes and continue. java.net.ConnectException: Connection timed out: no further information
I can query hive through Knox or using JDBC, so everything looks fine, but this. What could cause this problem ? Any idea ? Thank you.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Spark
07-31-2018
08:24 PM
Hello @Felix Albani, I tried hard here to get access without success. But now I am trying to use a new approach. The services are protected by Kerberus, why not use kinit?. I was able to connect to the metastore but I am getting the following error:
Caused by: java.lang.LinkageError: ClassCastException: attempting to castjar:file:/C:/Users/bquintin070317/.ivy2/cache/javax.ws.rs/javax.ws.rs-api/jars/javax.ws.rs-api-2.0.1.jar!/javax/ws/rs/ext/RuntimeDelegate.class to jar:file:/C:/Users/bquintin070317/.ivy2/cache/javax.ws.rs/javax.ws.rs-api/jars/javax.ws.rs-api-2.0.1.jar!/javax/ws/rs/ext/RuntimeDelegate.class I saw in this post in the forum where you should put this code and It should work: hc = new org.apache.spark.sql.hive.HiveContext(sc) hc.setConf("yarn.timeline-service.enabled","false") In my case I am using Spark 2 and I didn't find a way to do this, any idea? What I did was: val spark = SparkSession
.builder()
.master("local")
.appName("firmMappingReader")
.enableHiveSupport()
.config("yarn.timeline-service.enabled","false")
.getOrCreate()
//even forced here:
spark.sqlContext.setConf("yarn.timeline-service.enabled","false")
... View more
07-27-2018
04:11 PM
In my company we secure HDFS with Knox for external services, so when I am developing locally I need to go through Knox to fetch HDFS files. In our projects we are using Spark 1.6 and we had to implement a custom FileSystem to wrap this Knox access. As I am developing a new project using Spark 2.1, I was wondering if there is an easier way to fetch this HDFS data without implementing a custom file system. What's the right way to do this?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Knox
-
Apache Spark