Created on 12-09-2019 04:16 AM - last edited on 12-09-2019 05:49 AM by cjervis
I am migrating the spark jobs running in HDP 2.6 to HDP 3.1. When executing the spark jobs in HDP3.1 I am getting the following error.
java.util.NoSuchElementException: spark.sql.hive.hiveserver2.jdbc.url
at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1571)
at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1571)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.internal.SQLConf.getConfString(SQLConf.scala:1571)
at org.apache.spark.sql.RuntimeConfig.get(RuntimeConfig.scala:74)
at com.hortonworks.spark.sql.hive.llap.HWConf.getConnectionUrlFromConf(HWConf.java:143)
at com.hortonworks.spark.sql.hive.llap.HWConf.getConnectionUrl(HWConf.java:107)
at com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.build(HiveWarehouseBuilder.java:97)
at com.wunderman.hdp.Hdp3MigrationMain.main(Hdp3MigrationMain.java:16)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
The way I m creating the spark session
private static SparkSession getSparkSession() {
/**
* Create an instance of SparkSession to connect to the cluster
*/
SparkSession sparkSession = SparkSession.builder().appName("Hdp3 Migration").master("yarn").getOrCreate();
return sparkSession;
}
But the hiveserver2 jdbc url is configured in the spark config. I have added the following dependency in the pom.xml
<dependency>
<groupId>com.hortonworks.hive</groupId>
<artifactId>hive-warehouse-connector_2.11</artifactId>
<version>1.0.0.3.1.0.0-78</version>
</dependency>
And I am trying to execute the below code
String hdp3Enabled = args[0];
Dataset<Row> dataset;
String query="SELECT * FROM schema.tablename where col1='abc' ; //Sample query
try {
if ("Y".equalsIgnoreCase(hdp3Enabled)) {
HiveWarehouseSession hive = HiveWarehouseSession.session(sparkSession).build();
dataset = hive.executeQuery(query);
} else {
dataset = sparkSession.sql(query);
}
dataset.show();
} catch(Exception e) {
e.printStackTrace();
}
Share your suggestions to fix the issue.
Created 12-09-2019 10:14 AM
From HDP 3.x onwards, to work with hive databases you should use the HiveWarehouseConnector library /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.0.0-1634.jar as show in the below example
spark-shell --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://FQDN or IP:10000/" spark.datasource.hive.warehouse.load.staging.dir="/staging_dir" spark.hadoop.hive.zookeeper.quorum="zk_Quorum_ip's:2181" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.0.0-1634.jar
val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()
hive.showDatabases().show(100, false)
Could you try that and revert
Created on 12-10-2019 12:19 AM - edited 12-10-2019 12:21 AM
@Shelton I have tried as you suggested still getting the same error.
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import com.hortonworks.hwc.HiveWarehouseSession;
import com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder;
public class Hdp3MigrationMain extends CommonUtilities{
public static void main(String[] args) {
String hdp3Enabled = args[0];
Dataset<Row> dataset;
String query="select * from hive_schema.table1";
try {
if ("Y".equalsIgnoreCase(hdp3Enabled)) {
HiveWarehouseSession hive = HiveWarehouseBuilder.session(sparkSession).build();
dataset = hive.executeQuery(query);
} else {
dataset = sparkSession.sql(query);
}
dataset.show();
} catch(Exception e) {
e.printStackTrace();
}
}
And the same error occurs.
java.util.NoSuchElementException: spark.sql.hive.hiveserver2.jdbc.url
at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1571)
at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1571)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.internal.SQLConf.getConfString(SQLConf.scala:1571)
at org.apache.spark.sql.RuntimeConfig.get(RuntimeConfig.scala:74)
at com.hortonworks.spark.sql.hive.llap.HWConf.getConnectionUrlFromConf(HWConf.java:143)
at com.hortonworks.spark.sql.hive.llap.HWConf.getConnectionUrl(HWConf.java:107)
at com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.build(HiveWarehouseBuilder.java:97)
at com.wunderman.hdp.Hdp3MigrationMain.main(Hdp3MigrationMain.java:18)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Created on 12-10-2019 01:06 AM - edited 12-10-2019 01:08 AM
In one of your previous update you have mentioned that you have added "the hiveserver2 jdbc url is configured in the spark config."
However, looks like the error that you are getting is because the mentioned properties are not found in the spark2-defaults config which is there in your classpath.
So can you please make sure that you have included the correct CLASSPATH which is pointing to correct spark-defaults which has the following properties added as mentioned in "Required properties" Section of the following Doc: https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_configure_a_spar...
.
You must add several Spark properties through spark2-defaults in Ambari to use the Hive Warehouse Connector for accessing data in Hive. Alternatively, configuration can be provided for each job using --conf.
The URL for HiveServer2 Interactive
The URI for the metastore
The HDFS temp directory for batch writes to Hive, /tmp for example
The application name for LLAP service
The ZooKeeper hosts used by LLAP
Set the values of these properties as follows:
In Ambari, copy the value from Services > Hive > Summary > HIVESERVER2 INTERACTIVE JDBC URL.
Copy the value from hive.metastore.uris. In Hive, at the hive> prompt, enter set hive.metastore.uris and copy the output. For example, thrift://mycluster-1.com:9083.
Copy value from Advanced hive-interactive-site > hive.llap.daemon.service.hosts.
Copy the value from Advanced hive-sitehive.zookeeper.quorum.
.
.
Created on 12-10-2019 03:45 AM - edited 12-10-2019 05:11 AM
@jsensharma All the configs are set properly in our cluster. I am trying to access the external hive tables using hive ware house session.
Is that the error because of that as the documentation says its not needed to use HiveWarehouseSession for the external tables.